Skip to content
Snippets Groups Projects
Commit 4e2855e3 authored by Samuel Jia Cong Chua's avatar Samuel Jia Cong Chua
Browse files

editted equation formatting

parent cf28fbb0
No related merge requests found
......@@ -62,9 +62,9 @@ $$
where $\mathcal{A}=\{-M, \ldots, M\}$, corresponding to go left, right, and stay still. The state space $\mathcal{X}=\{-L, \ldots, L\}$, the dimension of $|\mathcal{X}|$ =$2L-1$. To add more stochastic into this model, $\epsilon_n$ is an additional noise will perturb the action choice with $\epsilon_n \sim \mathcal{N}(0,1)$, but was discretized over $\{-3 \sigma, \ldots, 3 \sigma\}$. The reward function is:
\begin{equation}
$$
r\left(x_n, a_n, \mu_n\right)=\left[-\frac{1}{2}\left|a_n\right|^2+q a_n\left(m_n-x_n\right)-\frac{\kappa}{2}\left(m_n-x_n\right)^2\right] \Delta_n
\end{equation}
$$
where $m_n=\sum_{x \in \mathcal{X}} x \mu_n(x)$ is the first moment of population distribution which serves as the reward to encourage agents to move to the population's average but also tries to keep dynamic movement.
......
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment