Classification Metrics
The d9d framework provides robust, distributed-ready classification metrics designed to handle large-scale data smoothly.
Binary AUROC
d9d.metric.impl.classification.BinaryAUROCMetric
Computes approximated AUROC for binary classification using histograms.
Standard AUROC computation requires storing the entire history of predictions to sort and rank them. This implementation solves the memory constraint by discretizing predictions into histograms.
This method employs a frequency-based sketching approach. It relies on the observation that the AUROC can be approximated by computing the area shared or separated by the probability density functions of the positive and negative classes. We maintain two separate histograms for positive and negative samples and apply the trapezoidal rule to estimate the area.
References
Albakour et al., "Fast and memory efficient AUC-ROC approximation for Stream Learning", 2021. https://www.researchgate.net/publication/353020448_Fast_and_memory_efficient_AUC-ROC_approximation_for_Stream_Learning
__init__(num_bins=10000)
Constructs the BinaryAUROCMetric object.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
num_bins
|
int
|
Number of bins for histogram approximation. This parameter controls the trade-off between memory consumption and approximation accuracy. |
10000
|
update(probs, labels)
Updates the metric statistics with a new batch of predictions.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
probs
|
Tensor
|
Predicted probabilities in range [0, 1]. |
required |
labels
|
Tensor
|
Ground truth binary labels. |
required |
Raises:
| Type | Description |
|---|---|
ValueError
|
If |
Confusion Matrix-based Metrics
When evaluating categorical outcomes, many standard statistics (Accuracy, Precision, Recall, F-Beta) share an underlying reliance on the Confusion Matrix.
To provide maximum flexibility and code reuse, d9d exposes a fluent, safe builder pattern via confusion_matrix_metric(). You define your metric in three distinct steps:
1. Problem Type: Define if this is binary, multiclass, or multilabel.
2. Statistic: Choose the formula to evaluate (e.g., with_accuracy, with_f1).
3. Aggregation: Choose how to reduce multi-dimensional data (micro, macro, weighted, or per_class).
Below are several common examples of how to assemble these configurations.
Binary Classification (Accuracy)
For a simple binary problem, you specify a probability threshold (usually 0.5). Using .with_accuracy() makes the metric calculate the overall correct predictions without needing complex aggregation.
Multiclass Classification (Top-5 Accuracy)
You can easily evaluate if the correct label appears within the top \(K\) predicted probabilities by passing top_k into the multiclass configuration. Since top_k treats the evaluation as a single broad "hit or miss", it effectively becomes a binary classification problem.
Multiclass Classification (Per-Class Precision)
Instead of collapsing results into a single global number, you might want to inspect the performance of strictly individual categories. Using the .per_class() aggregation bypasses global reductions entirely and returns a separate score (such as Precision) for every single class.
Multilabel Classification (Macro F1-Score)
For multilabel problems, multiple correct categories can exist simultaneously. Each class is evaluated independently against a probability threshold. To compute a single global metric value, you can use reductions like .macro() to average the specific statistic (e.g., F1-score) evenly across all classes, regardless of their individual sample frequency.
d9d.metric.impl.classification.confusion_matrix_metric()
Creates a new builder for configuring a ConfusionMatrixMetric.
Returns:
| Type | Description |
|---|---|
ConfusionMatrixMetricBuilder
|
A fresh builder instance to begin metric pipeline configuration. |
d9d.metric.impl.classification.ConfusionMatrixMetricBuilder
Builder for safely configuring a ConfusionMatrixMetric pipeline.
__init__()
Constructs the ConfusionMatrixMetric object.
binary(threshold=0.5)
build()
Bakes pipeline configurations into a ConfusionMatrixMetric.
Returns:
| Type | Description |
|---|---|
ConfusionMatrixMetric
|
A ready-to-process configured metric wrapper instance. |
Raises:
| Type | Description |
|---|---|
ValueError
|
If the problem type or statistic calculation is not specified. |
macro()
Computes the metric for each class independently and finds their unweighted mean.
Returns:
| Type | Description |
|---|---|
Self
|
The current builder instance. |
micro()
Computes the metric globally by summing the confusion matrices first.
Returns:
| Type | Description |
|---|---|
Self
|
The current builder instance. |
multiclass(num_classes, top_k=None)
Configures the metric for multiclass classification problems.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
num_classes
|
int
|
The total number of unique mutually-exclusive classes. |
required |
top_k
|
int | None
|
If provided, alters the underlying evaluation to measure if the target falls within the top K highest probabilities. |
None
|
Returns:
| Type | Description |
|---|---|
Self
|
The current builder instance. |
multilabel(num_classes, threshold=0.5)
Configures the metric for multilabel classification problems.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
num_classes
|
int
|
The total number of unique independent classes. |
required |
threshold
|
float
|
Value boundary for assigning positive boolean hits independently. |
0.5
|
Returns:
| Type | Description |
|---|---|
Self
|
The current builder instance. |
per_class()
Computes and returns the metric for each class independently without aggregating.
Returns:
| Type | Description |
|---|---|
Self
|
The current builder instance. |
weighted()
Computes the metric for each class independently and finds their average weighted by the true instances (support) for each class.
Returns:
| Type | Description |
|---|---|
Self
|
The current builder instance. |
with_accuracy()
Assigns conventional accuracy computations as the target statistic to evaluate.
Returns:
| Type | Description |
|---|---|
Self
|
The current builder instance. |
with_aggregation(method)
Configures general target methodology formulas defining multi-dimensional matrices.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
method
|
ClassificationAggregationMethod
|
Constant identifier pointing to strategy options available. |
required |
Returns:
| Type | Description |
|---|---|
Self
|
The current builder instance. |
with_f1()
Assigns harmonic mean calculations (F1) as the target statistic to evaluate.
Returns:
| Type | Description |
|---|---|
Self
|
The current builder instance. |
with_fbeta(beta)
with_precision()
Assigns target hit accuracy distribution (Precision) as the target statistic.
Returns:
| Type | Description |
|---|---|
Self
|
The current builder instance. |
with_recall()
Assigns target missing reduction distribution (Recall) as the target statistic.
Returns:
| Type | Description |
|---|---|
Self
|
The current builder instance. |
with_statistic(statistic)
Assigns entirely custom formulas interpreting matrix states natively.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
statistic
|
ConfusionMatrixStatistic
|
Instantiated formulation protocol. |
required |
Returns:
| Type | Description |
|---|---|
Self
|
The current builder instance. |