<aside> ✍️ Andreas, March 5, 2021, with discussion feedback from Jannik

</aside>

Hypothesis

The epistemic uncertainty of an ensemble is not a useful measure for OOD, generally, and in particular model-dependent.

This also questions computing epistemic uncertainty based on a single layer (softmax outputs) and treating the underlying models as black-boxes.

Epistemic Uncertainty

We view the members of an ensemble as being drawn from a distribution $w\sim\hat{p}(\omega)$. Then, we can use the well-known BALD equation:

$$ \underbrace{H[Y|x]}\text{predictive entropy} = \underbrace{I[Y;\Omega|x]}\text{epistemic uncertainty} + \underbrace{\mathbb{E}{\hat{p}(\omega)}H[Y|x,\omega]}\text{expected softmax entropy}, $$

where $\operatorname{H}[Y|x,\omega]$ is the softmax entropy of a single specific model $\omega$.

Epistemic Uncertainty measures the disagreement of the predictions of different models within the ensemble to capture "epistemic uncertainty".

The epistemic uncertainty of a Deep Ensemble has empirically been regarded as superior to MC Dropout.

Epistemic Uncertainty for OoD detection

Predictive Entropy and Epistemic Uncertainty are seen as good metrics to detect OoD samples: a high value is indicative.

<aside> 👉 DDU makes the claim that we can equate epistemic uncertainty with the sample density in feature space, which is also very useful for OoD detection.

</aside>

The SNGP paper shows that:

A classification model ought to output uniform predictions for OoD data.

$$ p(y \mid \mathbf{x})=p\left(y \mid \mathbf{x}, \mathbf{x} \in \mathscr{X}{\mathrm{IND}}\right) * p^{*}\left(\mathbf{x} \in \mathscr{X}{\mathrm{IND}}\right)+p_{\text {uniform }}\left(y \mid \mathbf{x}, \mathbf{x} \notin \mathscr{X}{\mathrm{IND}}\right) * p^{*}\left(\mathbf{x} \notin \mathscr{X}{\mathrm{IND}}\right) $$

Entropy is maximal for a uniform distribution. Hence, this view aligns with high predictive entropy being indicative of OoD data.

Is Epistemic Uncertainty as defined above always a good metric for OoD detection though?

What happens if the models in an ensemble become better at quantifying OoD?

"Quantifying OoD" refers to having a softmax entropy closer uniform for OoD data, as specified as optimum by SNGP above.

<aside> 👉 This can, for example, happen when we train on OoD data, for example, as many current approaches to OoD detection do.

</aside>