Risk-Averse Allocation Indices for Multi-Armed Bandit Problem

Assoc. Prof. Dr. Özlem Çavuş

Department of Industrial Engineering, Bilkent University, Turkey

Invited by: Sevtap Kestel
Place: https://zoom.us/j/98408006920?pwd=eHQ2YzNMakxFamhDL1k1eDRTTURIQT09
Zoom Meeting ID: 984 0800 6920
Passcode:  724032
Date/Time: 04
.05.2021, 15.30-16:30

Abstract: In classical multi-armed bandit problem, the aim is to find a policy maximizing the expected total reward, implicitly assuming that the decision maker is risk-neutral. On the other hand, the decision makers are risk-averse in some real life applications. In this study, we design a new setting based on the concept of dynamic risk measures where the aim is to find a policy with the best risk-adjusted total discounted outcome. We provide a theoretical analysis of multi-armed bandit problem with respect to this novel setting, and propose a priority-index heuristic which gives risk-averse allocation indices having a structure similar to Gittins index. Although an optimal policy is shown not always to have index-based form, empirical results express the excellence of this heuristic and show that with risk-averse allocation indices we can achieve optimal or near-optimal interpretable policies.


Last Updated:
05/05/2021 - 10:08