Classification Metrics

The d9d framework provides robust, distributed-ready classification metrics designed to handle large-scale data smoothly.

Binary AUROC

from d9d.metric.impl.classification import BinaryAUROCMetric

auroc = BinaryAUROCMetric()

`d9d.metric.impl.classification.BinaryAUROCMetric`

Bases: Metric[Tensor]

Computes approximated AUROC for binary classification using histograms.

Standard AUROC computation requires storing the entire history of predictions to sort and rank them. This implementation solves the memory constraint by discretizing predictions into histograms.

This method employs a frequency-based sketching approach. It relies on the observation that the AUROC can be approximated by computing the area shared or separated by the probability density functions of the positive and negative classes. We maintain two separate histograms for positive and negative samples and apply the trapezoidal rule to estimate the area.

References

Albakour et al., "Fast and memory efficient AUC-ROC approximation for Stream Learning", 2021. https://www.researchgate.net/publication/353020448_Fast_and_memory_efficient_AUC-ROC_approximation_for_Stream_Learning

`init(num_bins=10000)`

Constructs the BinaryAUROCMetric object.

Parameters:

Name	Type	Description	Default
`num_bins`	`int`	Number of bins for histogram approximation. This parameter controls the trade-off between memory consumption and approximation accuracy.	`10000`

`update(probs, labels)`

Updates the metric statistics with a new batch of predictions.

Parameters:

Name	Type	Description	Default
`probs`	`Tensor`	Predicted probabilities in range [0, 1].	required
`labels`	`Tensor`	Ground truth binary labels.	required

Raises:

Type	Description
`ValueError`	If `probs` or `labels` have different number of elements.

Confusion Matrix-based Metrics

When evaluating categorical outcomes, many standard statistics (Accuracy, Precision, Recall, F-Beta) share an underlying reliance on the Confusion Matrix.

To provide maximum flexibility and code reuse, d9d exposes a fluent, safe builder pattern via confusion_matrix_metric(). You define your metric in three distinct steps: 1. Problem Type: Define if this is binary, multiclass, or multilabel. 2. Statistic: Choose the formula to evaluate (e.g., with_accuracy, with_f1). 3. Aggregation: Choose how to reduce multi-dimensional data (micro, macro, weighted, or per_class).

Below are several common examples of how to assemble these configurations.

Binary Classification (Accuracy)

For a simple binary problem, you specify a probability threshold (usually 0.5). Using .with_accuracy() makes the metric calculate the overall correct predictions without needing complex aggregation.

from d9d.metric.impl.classification import confusion_matrix_metric

accuracy = (
    confusion_matrix_metric()
    .binary(threshold=0.5)
    .with_accuracy()
    .build()
)

Multiclass Classification (Top-5 Accuracy)

You can easily evaluate if the correct label appears within the top \(K\) predicted probabilities by passing top_k into the multiclass configuration. Since top_k treats the evaluation as a single broad "hit or miss", it effectively becomes a binary classification problem.

from d9d.metric.impl.classification import confusion_matrix_metric

top5_acc = (
    confusion_matrix_metric()
    .multiclass(num_classes=1000, top_k=5)
    .with_accuracy()
    .build()
)

Multiclass Classification (Per-Class Precision)

Instead of collapsing results into a single global number, you might want to inspect the performance of strictly individual categories. Using the .per_class() aggregation bypasses global reductions entirely and returns a separate score (such as Precision) for every single class.

from d9d.metric.impl.classification import confusion_matrix_metric

per_class_precision = (
    confusion_matrix_metric()
    .multiclass(num_classes=10)
    .with_precision()
    .per_class()
    .build()
)

Multilabel Classification (Macro F1-Score)

For multilabel problems, multiple correct categories can exist simultaneously. Each class is evaluated independently against a probability threshold. To compute a single global metric value, you can use reductions like .macro() to average the specific statistic (e.g., F1-score) evenly across all classes, regardless of their individual sample frequency.

from d9d.metric.impl.classification import confusion_matrix_metric

macro_f1 = (
    confusion_matrix_metric()
    .multilabel(num_classes=8, threshold=0.5)
    .with_f1()
    .macro()
    .build()
)

`d9d.metric.impl.classification.confusion_matrix_metric()`

Creates a new builder for configuring a ConfusionMatrixMetric.

Returns:

Type	Description
`ConfusionMatrixMetricBuilder`	A fresh builder instance to begin metric pipeline configuration.

`d9d.metric.impl.classification.ConfusionMatrixMetricBuilder`

Builder for safely configuring a ConfusionMatrixMetric pipeline.

`init()`

Constructs the ConfusionMatrixMetric object.

`binary(threshold=0.5)`

Configures the metric for binary classification problems.

Parameters:

Name	Type	Description	Default
`threshold`	`float`	Value boundary for assigning positive boolean classes.	`0.5`

Returns:

Type	Description
`Self`	The current builder instance.

`build()`

Bakes pipeline configurations into a ConfusionMatrixMetric.

Returns:

Type	Description
`ConfusionMatrixMetric`	A ready-to-process configured metric wrapper instance.

Raises:

Type	Description
`ValueError`	If the problem type or statistic calculation is not specified.

`macro()`

Computes the metric for each class independently and finds their unweighted mean.

Returns:

Type	Description
`Self`	The current builder instance.

`micro()`

Computes the metric globally by summing the confusion matrices first.

Returns:

Type	Description
`Self`	The current builder instance.

`multiclass(num_classes, top_k=None)`

Configures the metric for multiclass classification problems.

Parameters:

Name	Type	Description	Default
`num_classes`	`int`	The total number of unique mutually-exclusive classes.	required
`top_k`	`int \| None`	If provided, alters the underlying evaluation to measure if the target falls within the top K highest probabilities.	`None`

Returns:

Type	Description
`Self`	The current builder instance.

`multilabel(num_classes, threshold=0.5)`

Configures the metric for multilabel classification problems.

Parameters:

Name	Type	Description	Default
`num_classes`	`int`	The total number of unique independent classes.	required
`threshold`	`float`	Value boundary for assigning positive boolean hits independently.	`0.5`

Returns:

Type	Description
`Self`	The current builder instance.

`per_class()`

Computes and returns the metric for each class independently without aggregating.

Returns:

Type	Description
`Self`	The current builder instance.

`weighted()`

Computes the metric for each class independently and finds their average weighted by the true instances (support) for each class.

Returns:

Type	Description
`Self`	The current builder instance.

`with_accuracy()`

Assigns conventional accuracy computations as the target statistic to evaluate.

Returns:

Type	Description
`Self`	The current builder instance.

`with_aggregation(method)`

Configures general target methodology formulas defining multi-dimensional matrices.

Parameters:

Name	Type	Description	Default
`method`	`ClassificationAggregationMethod`	Constant identifier pointing to strategy options available.	required

Returns:

Type	Description
`Self`	The current builder instance.

`with_f1()`

Assigns harmonic mean calculations (F1) as the target statistic to evaluate.

Returns:

Type	Description
`Self`	The current builder instance.

`with_fbeta(beta)`

Assigns variable recall-focused FBeta score as the target statistic to evaluate.

Parameters:

Name	Type	Description	Default
`beta`	`float`	Emphasis coefficient towards recall impact strictly mathematically.	required

Returns:

Type	Description
`Self`	The current builder instance.

`with_precision()`

Assigns target hit accuracy distribution (Precision) as the target statistic.

Returns:

Type	Description
`Self`	The current builder instance.

`with_recall()`

Assigns target missing reduction distribution (Recall) as the target statistic.

Returns:

Type	Description
`Self`	The current builder instance.

`with_statistic(statistic)`

Assigns entirely custom formulas interpreting matrix states natively.

Parameters:

Name	Type	Description	Default
`statistic`	`ConfusionMatrixStatistic`	Instantiated formulation protocol.	required

Returns:

Type	Description
`Self`	The current builder instance.

Classification Metrics

Binary AUROC

d9d.metric.impl.classification.BinaryAUROCMetric

__init__(num_bins=10000)

update(probs, labels)

Confusion Matrix-based Metrics

Binary Classification (Accuracy)

Multiclass Classification (Top-5 Accuracy)

Multiclass Classification (Per-Class Precision)

Multilabel Classification (Macro F1-Score)

d9d.metric.impl.classification.confusion_matrix_metric()

d9d.metric.impl.classification.ConfusionMatrixMetricBuilder

__init__()

binary(threshold=0.5)

build()

macro()

micro()

multiclass(num_classes, top_k=None)

multilabel(num_classes, threshold=0.5)

per_class()

weighted()

with_accuracy()

with_aggregation(method)

with_f1()

with_fbeta(beta)

with_precision()

with_recall()

with_statistic(statistic)

`d9d.metric.impl.classification.BinaryAUROCMetric`

`init(num_bins=10000)`

`update(probs, labels)`

`d9d.metric.impl.classification.confusion_matrix_metric()`

`d9d.metric.impl.classification.ConfusionMatrixMetricBuilder`

`init()`

`binary(threshold=0.5)`

`build()`

`macro()`

`micro()`

`multiclass(num_classes, top_k=None)`

`multilabel(num_classes, threshold=0.5)`

`per_class()`

`weighted()`

`with_accuracy()`

`with_aggregation(method)`

`with_f1()`

`with_fbeta(beta)`

`with_precision()`

`with_recall()`

`with_statistic(statistic)`