The Low-Dimensional Stack — Structure-Centric ML

The Stack

Three components. One end-to-end pipeline.

01

SCOPE

Decomposable supervised evaluation framework

A five-component metric that evaluates clustering quality and identifies which structural property is failing when quality drops: Core Purity, Boundary Recall, Cluster Precision, Noise F1, and Count Accuracy. Compatible with any density-based algorithm. SCOPE is the tuning objective — replacing ARI as the optimization target produces clusters that match ground truth more reliably than ARI itself does.

02

AdaBox

Two-dimensional structure-centric clustering algorithm

The foundation algorithm of the structure-centric family. Operates on a kNN graph of the two-dimensional data. Produces zero noise points by default. Parameters are scale-invariant — the same configuration works whether the dataset has 1,000 or 1,000,000 points. Validated against HDBSCAN across 111 benchmark datasets.

03

SLCD

Sample-Label-Calibrate-Deploy parameter transfer

Tune AdaBox parameters on a 1,000-point sample. Deploy those exact parameters to a 500,000-point dataset. Quality is preserved — mean Δ ARI of +0.027 across tested datasets, while DBSCAN loses 0.404 and HDBSCAN loses 0.475 in ARI when their parameters are transferred at scale.

The low-D stack solves the deployment problem, not just the clustering problem. SCOPE picks the right parameters. AdaBox finds the structure. SLCD scales it without re-tuning.

— Low-D stack architecture

Evidence — Real Text Data

BERTopic pipeline, four datasets, four methods.

The standard text clustering pipeline in production today is BERTopic: sentence-BERT embeddings → UMAP to two dimensions → HDBSCAN. This is the pipeline that powers most social listening platforms, trend detection systems, and topic modeling tools. The benchmark below replaces only the final clustering step. Ada2D and AdaHD are scored on the same UMAP-reduced 2D data that HDBSCAN operates on.

ARI and SCOPE heatmap across 4 text datasets and 4 methods — Adjusted Rand Index (left) and SCOPE score (right) across four real-world text datasets. AdaHD wins ARI on three of four datasets and SCOPE on all four. Source: Multi_Dataset_Text_Benchmark.ipynb

Average across all four datasets

Method	Avg ARI	Avg SCOPE	Style
AdaHD (high-D structure-centric)	0.5015	0.7604	Native HD
Ada2D (low-D structure-centric)	0.4261	0.5766	UMAP → AdaBox
HDBSCAN* (tuned)	0.3611	0.3595	UMAP → HDBSCAN, hyper-tuned
HDBSCAN (default)	0.3516	0.3017	UMAP → HDBSCAN, default

20-Newsgroups 6-category UMAP scatter, ground truth vs HDBSCAN vs HDBSCAN* vs Ada2D vs AdaHD — 20 Newsgroups (6 categories, n=5,581) — UMAP 2D projection. **HDBSCAN finds k=13 with 8% noise (ARI 0.821); HDBSCAN* fragments to k=50 with 56% noise (ARI 0.225); Ada2D finds k=4 with 0% noise; AdaHD finds k=7 with 0% noise (ARI 0.751).** Source: Multi_Dataset_Text_Benchmark.ipynb (img-003)

SCOPE Decomposability

When clustering quality drops, you know which property failed.

Existing supervised clustering metrics — ARI, NMI, V-Measure — produce a single score. A bad score tells you something is wrong, but not what. SCOPE produces five independent components that each measure a distinct structural property of the result:

Component	Measures	Diagnostic
Core Purity	How clean are dense interior regions?	Low → over-merging of distinct clusters
Boundary Recall	Are points near edges correctly placed?	Low → algorithm is too cautious at edges
Cluster Precision	Do clusters correspond 1:1 with truth?	Low → fragmentation
Noise F1	Are noise points correctly identified?	Low → either over- or under-discarding
Count Accuracy	Is k correct?	Low → wrong number of clusters

A single SCOPE score of 0.55 on a difficult dataset is not informative on its own. But knowing that Core Purity is 0.92 while Cluster Precision is 0.30 tells you exactly that the algorithm is fragmenting good clusters — not over-merging them, not picking up noise. That diagnostic capability is what makes SCOPE useful as a tuning objective.

Evidence — SLCD Scale

From a 1,000-point sample to half a million points without re-tuning.

SLCD protocol A results: AdaBox, DBSCAN, HDBSCAN deployment quality — **Protocol A: Sample tuning, full-dataset deployment.** Across 10 real-world datasets, AdaBox sample-tuned configurations transfer cleanly to the full data (Δ ARI +0.027 on average). DBSCAN loses 0.404 and HDBSCAN loses 0.475 in ARI under the same protocol. SLCD's parameter transfer is uniquely enabled by AdaBox's scale-invariant parameters. Source: AdaBox_KDD2026_Experimental_Notebook (img-056)

+0.027

AdaBox average Δ ARI

Scale-invariant parameters transfer cleanly.

−0.404

DBSCAN average Δ ARI

Distance parameters fail at scale.

−0.475

HDBSCAN average Δ ARI

Mutual reachability distance does not survive transfer.

When To Use This Stack

The Low-D stack is the strongest choice when…

• The pipeline already includes UMAP, PCA, t-SNE, or another dimensionality reduction step.
• The data has under ~30 informative dimensions in its raw form.
• The deployment target requires a 2D visualization of the clustering.
• The application is text/topic clustering using sentence-BERT or similar embeddings (the BERTopic case).
• Customer segmentation, behavioral clustering, or trend detection on signal vectors.

For text and social-listening applications, the Low-D stack is a direct drop-in replacement for the HDBSCAN step in any UMAP-then-HDBSCAN pipeline. Integration cost: hours, not weeks.

The Low-Dimensional Stack.

Three components. One end-to-end pipeline.

SCOPE

AdaBox

SLCD

BERTopic pipeline, four datasets, four methods.

Average across all four datasets

When clustering quality drops, you know which property failed.

From a 1,000-point sample to half a million points without re-tuning.

The Low-D stack is the strongest choice when…