Learning to Optimize

Abstract: We consider decision-making by boundedly-rational agents in dynamic stochastic environments. The behavioral primitive is anchored to the shadow price of the state vector. Our agent forecasts the value of an additional unit of the state tomorrow using estimated models of shadow prices and transition dynamics, and uses this forecast to choose her control today. The control decision, together with the agent’s forecast of tomorrow’s shadow price, are then used to update the perceived shadow price of today’s states. By following this boundedlyoptimal procedure the agent’s decision rule converges over time to the optimal policy. Specifically, within standard linear-quadratic environments, we obtain general conditions for asymptotically optimal decision-making: agents learn to optimize. Our results carry over to closely related procedures based on valuefunction learning and Euler-equation learning. We provide examples showing that shadow-price learning extends to general dynamic-stochastic decisionmaking environments and embeds naturally in general-equilibrium models.

Full paper can be found here