Structure·Centric ML
Field Note No. 03  ·  Structure-Centric ML
arXiv: 2603.13339  ·  A. Elmahdi, PhD
When dimensionality reduction is part of the pipeline, this is the stack.

The Low-Dimensional Stack.

Three components — SCOPE, AdaBox, SLCD — operating together to produce structure-centric clusters in two-dimensional spaces. The strongest known pipeline for any application where UMAP, PCA, or t-SNE precede clustering: text analytics, social listening, customer segmentation, and the entire BERTopic ecosystem.

Three components. One end-to-end pipeline.

01

SCOPE

Decomposable supervised evaluation framework

A five-component metric that evaluates clustering quality and identifies which structural property is failing when quality drops: Core Purity, Boundary Recall, Cluster Precision, Noise F1, and Count Accuracy. Compatible with any density-based algorithm. SCOPE is the tuning objective — replacing ARI as the optimization target produces clusters that match ground truth more reliably than ARI itself does.

02

AdaBox

Two-dimensional structure-centric clustering algorithm

The foundation algorithm of the structure-centric family. Operates on a kNN graph of the two-dimensional data. Produces zero noise points by default. Parameters are scale-invariant — the same configuration works whether the dataset has 1,000 or 1,000,000 points. Validated against HDBSCAN across 111 benchmark datasets.

03

SLCD

Sample-Label-Calibrate-Deploy parameter transfer

Tune AdaBox parameters on a 1,000-point sample. Deploy those exact parameters to a 500,000-point dataset. Quality is preserved — mean Δ ARI of +0.027 across tested datasets, while DBSCAN loses 0.404 and HDBSCAN loses 0.475 in ARI when their parameters are transferred at scale.

The low-D stack solves the deployment problem, not just the clustering problem. SCOPE picks the right parameters. AdaBox finds the structure. SLCD scales it without re-tuning.
— Low-D stack architecture

BERTopic pipeline, four datasets, four methods.

The standard text clustering pipeline in production today is BERTopic: sentence-BERT embeddings → UMAP to two dimensions → HDBSCAN. This is the pipeline that powers most social listening platforms, trend detection systems, and topic modeling tools. The benchmark below replaces only the final clustering step. Ada2D and AdaHD are scored on the same UMAP-reduced 2D data that HDBSCAN operates on.

ARI and SCOPE heatmap across 4 text datasets and 4 methods
Adjusted Rand Index (left) and SCOPE score (right) across four real-world text datasets. AdaHD wins ARI on three of four datasets and SCOPE on all four. Source: Multi_Dataset_Text_Benchmark.ipynb

Average across all four datasets

MethodAvg ARIAvg SCOPEStyle
AdaHD (high-D structure-centric)0.50150.7604Native HD
Ada2D (low-D structure-centric)0.42610.5766UMAP → AdaBox
HDBSCAN* (tuned)0.36110.3595UMAP → HDBSCAN, hyper-tuned
HDBSCAN (default)0.35160.3017UMAP → HDBSCAN, default
20-Newsgroups 6-category UMAP scatter, ground truth vs HDBSCAN vs HDBSCAN* vs Ada2D vs AdaHD
20 Newsgroups (6 categories, n=5,581) — UMAP 2D projection. HDBSCAN finds k=13 with 8% noise (ARI 0.821); HDBSCAN* fragments to k=50 with 56% noise (ARI 0.225); Ada2D finds k=4 with 0% noise; AdaHD finds k=7 with 0% noise (ARI 0.751). Source: Multi_Dataset_Text_Benchmark.ipynb (img-003)

When clustering quality drops, you know which property failed.

Existing supervised clustering metrics — ARI, NMI, V-Measure — produce a single score. A bad score tells you something is wrong, but not what. SCOPE produces five independent components that each measure a distinct structural property of the result:

ComponentMeasuresDiagnostic
Core PurityHow clean are dense interior regions?Low → over-merging of distinct clusters
Boundary RecallAre points near edges correctly placed?Low → algorithm is too cautious at edges
Cluster PrecisionDo clusters correspond 1:1 with truth?Low → fragmentation
Noise F1Are noise points correctly identified?Low → either over- or under-discarding
Count AccuracyIs k correct?Low → wrong number of clusters

A single SCOPE score of 0.55 on a difficult dataset is not informative on its own. But knowing that Core Purity is 0.92 while Cluster Precision is 0.30 tells you exactly that the algorithm is fragmenting good clusters — not over-merging them, not picking up noise. That diagnostic capability is what makes SCOPE useful as a tuning objective.

From a 1,000-point sample to half a million points without re-tuning.

SLCD protocol A results: AdaBox, DBSCAN, HDBSCAN deployment quality
Protocol A: Sample tuning, full-dataset deployment. Across 10 real-world datasets, AdaBox sample-tuned configurations transfer cleanly to the full data (Δ ARI +0.027 on average). DBSCAN loses 0.404 and HDBSCAN loses 0.475 in ARI under the same protocol. SLCD's parameter transfer is uniquely enabled by AdaBox's scale-invariant parameters. Source: AdaBox_KDD2026_Experimental_Notebook (img-056)
+0.027
AdaBox average Δ ARI
Scale-invariant parameters transfer cleanly.
−0.404
DBSCAN average Δ ARI
Distance parameters fail at scale.
−0.475
HDBSCAN average Δ ARI
Mutual reachability distance does not survive transfer.

The Low-D stack is the strongest choice when…

• The pipeline already includes UMAP, PCA, t-SNE, or another dimensionality reduction step.
• The data has under ~30 informative dimensions in its raw form.
• The deployment target requires a 2D visualization of the clustering.
• The application is text/topic clustering using sentence-BERT or similar embeddings (the BERTopic case).
• Customer segmentation, behavioral clustering, or trend detection on signal vectors.

For text and social-listening applications, the Low-D stack is a direct drop-in replacement for the HDBSCAN step in any UMAP-then-HDBSCAN pipeline. Integration cost: hours, not weeks.