На 1 февраля 2023 года оригинальность более 80%
Могу проверить вам актуальную оригинальность работы при покупке, пишите в личку.
Оригинал документа в pdf, конвертация в Word автоматическая (в word могут быть недочеты, которые вы легко исправите самостоятельно)
This paper demonstrates the capabilities of Deep Reinforcement Learning algorithms
in the area of financial portfolio management. This field has seen a huge development in recent years, because of the increased computational power and increased research in sequential decision making through control theory.
In this paper we have designed an environment for trading that is compatible with OpenAI gym framework. It simulates real market behavior and can be utilized to assess different portfolio optimization strategies. Also it is used to train reinforcement learning algorithms (DDPG & DQN). The agents can act in this environment by allocating the weights of stocks in the portfolio in each time step.
We have implemented Deep Reinforcement Learning models that act as autonomous portfolio optimization agents. In particular, we focus on Deep Deterministic Policy Gradient
and Deep Q-Network algorithms, which are model-free reinforcement learning algorithms that learn the quality of actions and tell agents what action to take under what circumstances.
We have performed a comparative analysis of the Reinforcement Learning based optimization strategy and more traditional «Follow the Winner», «Follow the Loser», «Random» and «Uniformly Balanced» strategies to find out which agent outperforms all the other strategies.
Table of Contents
List ob abbreviations.....................................................................................................5 Notation.........................................................................................................................6 Introduction...................................................................................................................7 Part 1. Theoretical Part. (Background)........................................................................10 Chapter 1. Deep Learning............................................................................................10
1.1 Perceptron..........................................................................................................10 1.2 Neural Network.................................................................................................11 1.3 Activation Function...........................................................................................12 1.4 Loss Function....................................................................................................14 1.5 Backpropagation................................................................................................16 1.6 Optimization Algorithms...................................................................................16 1.7 Gradient Descent Optimization Algorithms......................................................16 1.8 Overfitting.........................................................................................................21
Chapter 2. Reinforcement Learning............................................................................22 2.1 Key Concepts.....................................................................................................22 2.2 Taxonomy of RL Algorithms.............................................................................30
Chapter 3. Deep Reinforcement Learning...................................................................33 3.1 Vanilla Policy Gradient (VPG)..........................................................................33 3.2 Deep Deterministic Policy Gradient (DDPG)...................................................35
Chapter 4. Financial Theory........................................................................................40 4.1 Financial Terms and Concepts...........................................................................40 4.2 Statistical Moments...........................................................................................42
Part 2. Practical Part....................................................................................................49 Chapter 5. Trading environment..................................................................................49
5.1 OpenAI Gym.....................................................................................................49 5.2 MDP model........................................................................................................49 5.3 Action Space......................................................................................................50 5.4 State and Observation Space.............................................................................50 5.5 Reward signal....................................................................................................50 5.6 Trading environment implementation...............................................................51 5.7 Dataset...............................................................................................................51
Chapter 6. Trading Agents...........................................................................................52 6.1 Base Agent.........................................................................................................52 6.2 Regular Agents..................................................................................................52 6.3 DQN Agent........................................................................................................53 6.4 DDPG Agent......................................................................................................53
Chapter 7. Experiments...............................................................................................55 7.1 Experiments in OpenAI Gym environment.......................................................55 7.2 Experiments in Trading Environment...............................................................56
Results.....................................................................................................................59 Conclusions.............................................................................................................61 References...............................................................................................................62
Application 1. Trading environment package listing...................................................64 Application 2. DDPG Agent package listing...............................................................70 Application 3. Financial metrics package listing.........................................................76
References
Achiam, J.. Spinning Up in Deep Reinforcement Learning, 2018.
Baviera, R., Pasquini, M., Serva, M. and Vulpiani, A.. Optimal Strategies for Prudent Investors, arXiv:cond-mat/9804297, 1998.
Benhamou, E., Saltiel, D., Guez, B. and Paris, N.. Testing Sharpe ratio: luck or skill?, arXiv:1905.08042, 2019.
Brockman, G., Cheung, V., Pettersson, L., Schneider, J., Schulman, J., Tang, J. and Zaremba, W.. OpenAI Gym, arXiv:1606.01540, 2016.
Chen, J.. Skewness.,
Investopedia:https://www.investopedia.com/terms/s/skewness.asp, 2021.
De, S., Mukherjee, A. and Ullah, E.. Convergence guarantees for RMSProp and ADAM in non-convex optimization and an empirical comparison to Nesterov acceleration, arXiv:1807.06766, 2018.
Duchi, J., Hazan, E. and Singer, Y.. Adaptive Subgradient Methods for Online Learning and Stochastic Optimization, Journal of Machine Learning Research, 2011. Fernando, J.. Sharpe Ratio., Investopedia:
https://www.investopedia.com/terms/s/sharperatio.asp, 2021.
Goodfellow, I., Bengio, Y. and Courville, A.. Deep Learning, 2016.
Han, M., Zhang, L., Wang, J. and Pan, W.. Actor-Critic Reinforcement Learning for Control with Stability Guarantee, arXiv:2004.14288, 2020.
Hester, T., Vecerik, M., Pietquin, O., Lanctot, M., Schaul, T., Piot, B., Horgan, D., Quan, J., Sendonaris, A., Dulac-Arnold, G., Osband, I., Agapiou, J., Leibo, J. Z. and Gruslys, A.. Deep Q-learning from Demonstrations, arXiv:1704.03732, 2017. Kharitonov G. D. Deep reinforcement learning in Financial Portfolio Management // Information and telecommunication technologies and mathematical modeling of
high-tech systems: materials of the All-Russian conference with international participation. Moscow, RUDN, April 19–23, 2021 - Moscow: RUDN, 2021. - pp. 288-294.
Kukačka, J., Golkov, V. and Cremers, D.. Regularization for Deep Learning: A Taxonomy, arXiv:1710.10686, 2017.
Lehle, B. and Peinke, J.. Analyzing a stochastic process driven by Ornstein-Uhlenbeck noise, Phys. Rev. E 97, 012113 (2018) arXiv:1702.00032, 2017.
Liu, R. and Zou, J.. The Effects of Memory Replay in Reinforcement Learning, arXiv:1710.06574, 2017.
Mnih, V., Kavukcuoglu, K., Silver, D., Rusu, A. A., Veness, J., Bellemare, M. G., Graves, A., Riedmiller, M., Fidjeland, A. K., Ostrovski, G., Petersen, S., Beattie, C., Sadik, A., Antonoglou, I., King, H., Kumaran, D., Wierstra, D., Legg, S. and Hassabis, D.. Human-level control through deep reinforcement learning, Nature 518:529-533, 2015.
Nwankpa, C., Ijomah, W., Gachagan, A. and Marshall, S.. Activation Functions: Comparison of trends in Practice and Research for Deep Learning,
arXiv:1811.03378, 2018.
Paszke, A., Gross, S., Massa, F., Lerer, A., Bradbury, J., Chanan, G., Killeen, T., Lin, Z., Gimelshein, N., Antiga, L., Desmaison, A., Kopf, A., Yang, E., DeVito, Z.,
Raison, M., Tejani, A., Chilamkurthy, S., Steiner, B., Fang, L., Bai, J. and Chintala, S.. PyTorch: An Imperative Style, High-Performance Deep Learning Library, 8024-8035, 2019.
Quandl. Quandl API, 2016.
Shiloh-Perl, L. and Giryes, R.. Introduction to deep learning, arXiv:2003.03253, 2020.
Streeter, M.. Learning Effective Loss Functions Efficiently, arXiv:1907.00103, 2019.
Sutton, R. and Barto, A.. Reinforcement learning, an introduction, 2018.