講座題目:Empirical Gittins Index Strategies with ε-Explorations for Multi-armed Bandit Problems
主 講 人:華東師范大學吳賢毅教授
講座時間:2023年5月18日(周四)13:40-14:40
講座地點:6號學院樓402會議室
主辦單位:新葡萄8883官網AMG 浙江省2011“數據科學與大數據分析協同創新中心”
摘 要:
The machine learning/statistics literature has so far considered largely multi-armed bandit (MAB) problems in which the rewards from every arm are assumed independent and identically distributed. For more general MAB models in which every arm evolves according to a rewarded Markov process, it is well known the optimal policy is to pull an arm with the highest Gittins index. When the underlying distributions are unknown, an empirical Gittins index rule withε-exploration (abbreviated as empiricalε-Gittinx index rule) is proposed to solve such MAB problems. This procedure is constructed by combining the idea ofε-exploration (for exploration) and empirical Gittins indices (for exploitation) computed by applying the Largest-Remaining-Index algorithm to the estimated underlying distribution. The convergence of empirical Gittins indices to the true Gittins indices and expected discounted total rewards of the empiricalε-Gittinx index rule to those of the oracle Gittins index rule is provided. A numerical simulation study is demonstrated to show the behavior of the proposed policies, and its performance over theε-mean reward is discussed.
主講人簡介:
吳賢毅,博士、教授,華東師范大學經濟與管理學部統計學院教授,研究及教學內容涉及統計學、機器學習/人工智能、非壽險精算學、隨機調度等領域,在國際主流學術雜志發表過學術論文70余篇,在國內外出版社出版過專著兩部,教材一部。
歡迎感興趣的師生積極參加!