Research | Brandon Kaplowitz

Papers

Reinforcement Learning and Consumption-Savings Behavior (Job Market Paper)

Brandon Kaplowitz

Working Paper

macroeconomics, consumption, reinforcement learning

This paper demonstrates how reinforcement learning can explain two puzzling empirical patterns in household consumption behavior during economic downturns. I develop a model where agents use Q-learning with neural network approximation to make consumption-savings decisions under income uncertainty, departing from standard rational expectations assumptions. The model replicates two key findings from recent literature: (1) unemployed households with previously low liquid assets exhibit substantially higher marginal propensities to consume (MPCs) out of stimulus transfers

compared to high-asset households (0.50 vs 0.34), even when neither group faces borrowing constraints, consistent with Ganong et al. (2024); and (2) households with more past unemployment experiences maintain persistently lower consumption levels after controlling for current economic conditions, a “scarring” effect documented by Malmendier and Shen (2024). Unlike existing explanations based on belief updating about income risk or ex-ante heterogeneity, the reinforcement learning mechanism generates both higher MPCs and lower consumption levels simultaneously through value

function approximation errors that evolve with experience. Simulation results closely match the empirical estimates, suggesting that adaptive learning through reinforcement learning provides a unifying framework for understanding how past experiences shape current consumption behavior beyond what current economic conditions would predict.

Read Full Work

Bayesian Exploration Networks

Mattie Fellows*, Brandon Kaplowitz*, Christian Schroeder de Witt, Shimon Whiteson (* indicates equal contribution)

International Conference on Machine Learning 2024

reinforcement learning, bayesian RL

Bayesian reinforcement learning (RL) offers a principled and elegant approach for sequential decision making under uncertainty. Most notably, Bayesian agents do not face an exploration/exploitation dilemma, a major pathology of frequentist methods. However theoretical understanding of model-free approaches is lacking. In this paper, we introduce a novel Bayesian model-free formulation and the first analysis showing that model-free approaches can yield Bayes-optimal policies. We show all existing model-free approaches make approximations that yield policies that can be arbitrarily Bayes-suboptimal. As a first step towards model-free Bayes optimality, we introduce the Bayesian exploration network (BEN) which uses normalising flows to model both the aleatoric uncertainty (via density estimation) and epistemic uncertainty (via variational inference) in the Bellman operator. In the limit of complete optimisation, BEN learns true Bayes-optimal policies, but like in variational expectation-maximisation, partial optimisation renders our approach tractable. Empirical results demonstrate that BEN can learn true Bayes-optimal policies in tasks where existing model-free approaches fail.

Read Full Work

Works in Progress

Learning for Imperfect Information Adversarial Games (Gravel)

Modelling Opinion Dynamics at Scale Using Deep MARL

Early Drafts and In Progress Papers

Learning

Value-Based Reinforcement Learning Matches Otherwise Challenging-to-Explain Consumption Patterns Out of Covid Stimulus Payments (JMP)

Inequality & Heterogeneity

(Early Work) The Effect of Local Economic Shocks in a Geospatial Model with Moving (w/ Man Chon Iao, Doruk Gokalp )

Statistics/ML

The AI Macroeconomy: A New Set of Benchmarks for Multiagent RL Models (Workshop Presentation 2022 ICML)

“Bayesian Exploration Networks” (Mattie Fellows*, Brandon Kaplowitz*, Christian Schroeder De Witt, Shimon Whiteson), Under Review, ICML 2024: https://arxiv.org/abs/2308.13049 (* Equal Contribution)

EarlyWork (planned for Neurips 2024): “Provable Convergence to Nash Equilibria with Public Beliefs in Large Imperfect Information Extensive-Form Games” (Coauthors: Gabriele Farina, Sobhan Mohammadpour, Sam Sokota). A Followup to Both ReBeL (Brown et al. 2020) and “Abstracting Imperfect Information Away from Two-Player Zero-Sum Games” (Sokota et al 2023), we establish proofs and rates of convergence for the Public Belief Game, and use it to develop an improved deep RL algorithm that we test on large instances of “Liar’s Dice”, a game in the poker-family, infeasible to solve via traditional counterfactual regret minimization. We efficiently achieve near-0 regret, a new SOTA result. Economically, this opens up the possibility of studying large models of imperfect infor- mation with learning, with strategic incentives, such as those that would occur during stock-market crashes, Federal Reserve guidance or other informational shocks.