Consider two random variables $Y$ and $Z$. We assume that we have access to side-information $Z$ to help us “explain” $Y$. $Z$ could be the model parameters for example (and it makes sense to keep that in mind for the rest of this).
Information Theory. The entropy $H[Z]$ captures the uncertainty about $Z$. The mutual information $\operatorname{I}[Y ; Z]$ captures the notion of how much $Z$ tells us about $Y$ (and vice-versa). Similarly, the conditional entropy $\operatorname{H}[Y \mid Z]$ captures the uncertainty that is left in $Y$ after we have predicted $Z$. Overall, we have the well-known decomposition of mutual information via its definition.
$$ \underbrace{\operatorname{H}[Y]}{\text{(total) uncertainty}} = \underbrace{\operatorname{H}[Y \mid Z]}{\text{unexplained uncertainty}} + \underbrace{\operatorname{I}[Y ; Z]}_{\text{explained uncertainty}}. $$
Law of Total Variance. Similarly, the law of total variance says for random variables $Y$ and $Y$ in the same probability space:
$$ \underbrace{\operatorname{Var}(Y)}{\text{(total) uncertainty}} = \underbrace{\mathbb{E}[\operatorname{Var}(Y \mid Z)]}{\text{unexplained uncertainty}} + \underbrace{\operatorname{Var}(\mathbb{E}[Y \mid Z])}_{\text{explained uncertainty}}. $$
Here, the overall variance captures the uncertainty, and the variance after conditioning captures the unexplained (residual) variance, while the the variance of the mean captures how the explained variance.
More concretely, $\mathbb{E}[Y \mid Z]$ would commonly presents the best model prediction (using RMSE), while $\operatorname{Var}(Y \mid Z)$ captures how much larger the unexplained error is (using RSME — we see this when we expand the definition of $\operatorname{Var}$).
See also https://en.wikipedia.org/wiki/Law_of_total_variance and https://en.wikipedia.org/wiki/Mutual_information#Relation_to_conditional_and_joint_entropy.
The big question that arises from having two different decompositions is whether they agree:
Do $\operatorname{Var}(\mathbb{E}[Y \mid Z])$ and $\operatorname{I}[Y ; Z]$ capture the same quantities, and if not, how do they differ?
Now, variance and entropy are generally not linearly related. Instead, for any univariate random variable $X$, we have:
$$ \operatorname{H}[X]\le\frac{1}{2}\log (2\pi e \, \operatorname{Var}(X)), $$
where we use that the Gaussian distribution is the maximum entropy distribution given a fixed variance—with equality when $X$ follows a univariate normal distribution.
Univariate normal $Y, Z$. Let’s assume $Y$ and $Y \mid Z$ follow univariate normal distributions (which is the simplest modeling assumption really). Then, we have:
$$ \begin{aligned} \operatorname{I}[Y;Z] &= \operatorname{H}[Y] - \operatorname{H}[Y \mid Z]\\&=\frac{1}{2}\log (2\pi e \, \operatorname{Var}(Y))-\mathbb{E}_Z [\frac{1}{2}\log (2\pi e \, \operatorname{Var}(Y \mid z))]\\ &=\frac{1}{2}\log \operatorname{Var}(Y) -\mathbb{E}_Z [\frac{1}{2}\log \operatorname{Var}(Y \mid z)] \end{aligned} $$
We can use Jensen’s inequality now in two ways to obtain both an upper and a lower bound ($\log$ is concave and $-\log$ is convex):