Tutorial Descripton


REINFORCEMENT LEARNING IN FINANCE
by JOHN MOODY


Abstract
This tutorial will provide an introduction to Reinforcement Learning (RL) methods and their application to dynamic optimization in finance.

A wide range of important financial problems involve optimization over time. Examples include inter-temporal portfolio management, asset-liability modeling, personal financial planning, binomial tree option pricing, early exercise of options, asset allocation, trading and market making. Such problems have traditionally been solved using methods of dynamic programming (DP) in discrete time or stochastic control in continuous time.

Reinforcement learning has been developed in the machine learning, neural computation and control engineering communities. RL considers goal-directed, adaptive agents that seek to maximize reward through interaction with an uncertain environment. RL algorithms provide an attractive alternative to DP, since they typically do not assume a complete model of the underlying system and are often much more efficient. Hence, RL is a potentially powerful approach for solving challenging financial problems.

The dominant RL paradigm over the past 50 years was derived from dynamic programming, wherein RL agents learn a Value Function (VF). VF algorithms such as TD(lambda) and Q-Learning are theoretically appealing, and have proven effective for certain problems in computer games, telecommunications and robotics. However, they are frequently found to be inefficient or non-robust in practice.

An alternative approach, Direct Reinforcement (DR), has recently been revisited, whereby the DR agent learns a strategy directly. DR can enable a simpler problem representation, avoid Bellman's "curse of dimensionality", and offer compelling advantages in efficiency. Recently developed DR methods include policy gradient, policy search and the RRL (recurrent reinforcement learning) algorithm.

This tutorial will describe essential ideas of value function and direct reinforcement methods. We will review the application of VF reinforcement to financial problems, and demonstrate how DR can be used to optimize portfolios, asset allocations and trading systems. We show how DR traders can be designed to discover strategies that maximize profit, economic utility or risk adjusted returns. Illustrations include a monthly asset allocation system that maximizes Sharpe ratio and an intra-daily currency trader that minimizes downside risk.

Finally, we consider a central question in current RL research: "Should one learn the value function or learn the policy?"

Biography
John Moody is a Professor of Computer Science at the OGI School of Science & Engineering. His research interests include machine learning, neural and statistical computing, time series analysis and computational finance. He founded OGI's Computational Finance program in early 1996, served as its Director until June 2002, and received the first ever OGI Faculty Award for Educational Contribution in June 2001. He previously served as Program Co-Chair of Computational Finance 2000 (London) and Program Chair of Neural Information Processing Systems (NIPS). Prior to joining OGI, he held positions in Computer Science and Neuroscience at Yale University and at the Institute for Theoretical Physics in Santa Barbara. Moody received his B.A. in Physics from the University of Chicago, and earned his and Ph.D. in Theoretical Physics at Princeton.



Back to Tutorial