where $\mathcal{A}=\{-M, \ldots, M\}$, corresponding to go left, right, and stay still. The state space $\mathcal{X}=\{-L, \ldots, L\}$, the dimension of $|\mathcal{X}|$ =$2L-1$. To add more stochastic into this model, $\epsilon_n$ is an additional noise will perturb the action choice with $\epsilon_n \sim \mathcal{N}(0,1)$, but was discretized over $\{-3 \sigma, \ldots, 3 \sigma\}$. The reward function is:
where $m_n=\sum_{x \in \mathcal{X}} x \mu_n(x)$ is the first moment of population distribution which serves as the reward to encourage agents to move to the population's average but also tries to keep dynamic movement.