Given a Bayesian network, sometimes we would like to make a decision based on the information in the Bayesian network. This is called decision making and maximum expected utility is often used for finding the optimal action or policy.
In this blog post, I would like to discuss maximum expected utility for decision making.
Decision making situation $\mathcal{D}$, we have the random variable action $A$ and the random variable state $X$. Note that the action variable and the state variable can both be multivariate. $X$ and $A$ forms a Bayesian network $G(X, A)$. The utility function given $A$ and $X$ in the Bayesian network is $U(X, A)$.
The goal of decision making is find the optimal action or policy such that the expected utility is maximized.
We define the expected utility given action $A$ as
\[\begin{align} \mathbb{E} [U(A)] &= \sum_{x \in X} P(X = x | A) U(X = x, A) \\ \end{align}\]The optimal action $a^{\ast}$ that maximizes the expected utility is
\[\DeclareMathOperator*{\argmin}{argmin} \DeclareMathOperator*{\argmax}{argmax} \begin{align} a^{\ast} = \argmax_{a \in A} \mathbb{E} [U(A = a)] \end{align}\]In this particular setting, the state variable $X$ has no influence on the action variable $A$, but the action variable $A$ might have influence on the state variable $X$.
More generally, the state variable $X$ could affect the action variable $A$ and vise versa, so we have to use the joint distribution $P(X, A)$.
\[\begin{align} \mathbb{E} [U] &= \sum_{x \in X, a \in A} P(X = x, A = a) U(X = x, A = a) \\ \end{align}\]At the first glance, this is an constant value that dependents on neither $X$ nor $A$. What is the variable for optimization? Let’s factorize $P(X, A)$ using a Bayesian network first. Suppose $X$ is multivariate and $X = \{X_1, X_2, \cdots, X_n\}$,
\[\begin{align} P(X, A) = \prod_{i}^{n} P(X_i | \text{Pa}(X_i) ) P(A | \text{Pa}(A)) \end{align}\]We found $P(A | \text{Pa}(A))$ is actually a policy, sometimes referred as decision rule, that remains to be optimized. We further denote $P(A | \text{Pa}(A))$ as $\pi(A | Z)$ or $\delta(A | Z)$ where $Z = \text{Pa}(A)$, and $W = X - Z - A$.
\[\begin{align} \mathbb{E} [U(\pi)] &= \sum_{X, A} P(X, A) U(X, A) \\ &= \sum_{X_1, X_2, \cdots, X_n, A} \bigg( \prod_{i}^{n} P(X_i | \text{Pa}(X_i) ) \pi(A | Z) U(X, A) \bigg) \\ &= \sum_{Z, A} \sum_{W} \bigg( \prod_{i}^{n} P(X_i | \text{Pa}(X_i) ) \pi(A | Z) U(X, A) \bigg) \\ &= \sum_{Z, A} \pi(A | Z) \sum_{W} \bigg( \prod_{i}^{n} P(X_i | \text{Pa}(X_i) ) U(X, A) \bigg) \\ \end{align}\]Note that $\sum_{W} \Big( \prod_{i}^{n} P(X_i | \text{Pa}(X_i) ) U(X, A) \Big)$ is just a function that depends on $A$ and $Z$, we denote
\[\mu(A, Z) = \sum_{W} \bigg( \prod_{i}^{n} P(X_i | \text{Pa}(X_i) ) U(X, A) \bigg)\]Therefore, $\mathbb{E} [U(\pi)]$ could be abbreviated as
\[\begin{align} \mathbb{E} [U(\pi)] &= \sum_{Z, A} \pi(A | Z) \mu(A, Z) \\ \end{align}\] \[\begin{align} \pi^{\ast} &= \argmax_{\pi} \mathbb{E} [U(\pi)] \\ &= \argmax_{\pi} \sum_{Z, A} \pi(A | Z) \mu(A, Z) \\ &= \argmax_{\pi} \sum_{Z} \sum_{A} \pi(A | Z) \mu(A, Z) \\ \end{align}\]It is not difficult to find that the optimal policy $\pi^{\ast}$ is
\[\begin{align} \pi^{\ast}(A | Z) &= \begin{cases} 1 & \text{if $A = \argmax_{A} \mu(A, Z)$}\\ 0 & \text{otherwise}\\ \end{cases} \\ \end{align}\]In this case the probability distribution $\pi^{\ast}$ degenerates to a deterministic function.
\[a^{\ast} = \argmax_{a \in A} \mu(A = a, Z = z)\]In this particular setting, we would need to observe $Z$ in order to find the optimal action.
These fundamental concepts are crucial for Markov decision process and reinforcement learning.
Maximum Expected Utility and Decision Making was originally published by Lei Mao at Lei Mao's Log Book on June 21, 2021.