講座題目:高維海量數據雙分割模型整合
主講人:武漢大學劉妍巖教授
講座時間:2022年12月8日(周四)14:30-15:30
講座地點:騰訊會議(469-612-022)
主辦單位:新葡萄8883官網AMG浙江省2011“數據科學與大數據分析協同創新中心”
摘 要:
Massive data are often featured with high dimensionality as well as large sample size, which typically cannot be stored in a single machine and thus make both analysis and prediction challenging. We propose a distributed gridding model aggregation (DGMA) approach to predicting the conditional mean of a response variable, which overcomes the storage limitation of a single machine and the curse of high dimensionality. Specifically, on each local machine that stores partial data of relatively moderate sample size, we develop the model aggregation approach by splitting predictors wherein a greedy algorithm is developed. To obtain the optimal weights across all local machines, we further design a distributed and communication-efficient algorithm. Our procedure effectively distributes the workload and dramatically reduces the communication cost. Extensive numerical experiments are carried out on both simulated and real datasets to demonstrate the feasibility of the DGMA method.
主講人簡介:
劉妍巖,武漢大學數學與統計學院教授,博士生導師。2001年獲武漢大學理學博士學位。主要研究方向為生存分析、半參數統計推斷、復雜高維數據模型結構選擇以及大數據統計分析技術等。曾到美國北卡來羅納大學教堂山分校、加拿大Simon-Fraser大學、香港理工大學、香港中文大學、德國Greifswald大學等學校短期訪問和工作。主持完成國家自然科學基金以及教育部基金項目6項,在統計學期刊JournalofMachine Learning Research, Biometrics, Biostatistics,Genetics,LifetimeDataAnalysis等期刊發表SCI研究論文六十余篇。目前擔任國際統計學期刊statisticalpapers副主編,數理統計與管理副主編(2022.01-2025.12),中國現場統計學會第十一屆理事會常務理事、中國數學會女專家工作委員會委員。
歡迎感興趣的師生積極參加!