FAI-Seminar: Previous Talks


2024 R02
Time Speaker Talk Title Talk Info Paper Video
07/19 张博航
(北京大学)
Beyond Weisfeiler-Lehman: A Quantitative Framework for GNN Expressiveness Talk Info [1], [2],
[3]
B站
08/09 黎善达
(CMU)
Inference Scaling Law of Large Language Models and Second-Prize Winning Solution of AIMO Talk Info [1], [2] B站
08/16 王天浩
(TTIC)
Tractable training dynamics of transformers for in-context learning Talk Info [1], [2] B站
08/23 吴京风
(Berkeley)
Reimaging Gradient Descent: Large Stepsize, Oscillation, and Acceleration Talk Info [1] B站
08/30 马梓业
(港城大)
Navigating the non-convex landscape via amplifying escape directions of saddle points Talk Info [1], [2],
[3]
B站
11/01 刘勇
(中国人民大学)
Can Retrieval Augmented Generation (RAG) Enhance the LLM’s Reasoning Capabilities? Talk Info B站

2024 R01
Time Speaker Talk Title Talk Info Paper Video
Special talk 05/31 李建
(清华大学)
Generalization Error and Implicit Bias of Gradient Methods in Deep Learning Talk Info B站
03/08 翟润天
(CMU)
On the Generalization of Representation Learning and Big Foundation Models Talk Info [1, 2] B站
03/15 罗胜杰
(北京大学)
Enabling Efficient Equivariant Operations in the Fourier Basis via Gaunt Tensor Products Talk Info [1] B站
03/22 高天宇(Princeton) Long-Context Language Modeling with Parallel Context Encoding Talk Info B站
03/29 邹荻凡
(香港大学)
Faster Sampling without Isoperimetry via Diffusion-based Monte Carlo Talk Info [1] B站
04/05 陆一平
(NYU)
Simulation-Calibrated Scientific Machine Learning Talk Info [1] B站
04/12 俞鼎力(Princeton) Tensor Programs VI: Feature Learning in Infinite-Depth Neural Networks Talk Info [1] B站
04/19 吕凯风(Princeton) Understanding the Limitations of Neural Networks on Algorithmic Reasoning Talk Info [1, 2] B站
04/26 李禹辰
(CMU)
Towards Mathematical Understanding of Modern Language Models Talk Info [1, 2,
3, 4]
B站

2023 R03
Time Speaker Talk Title Talk Info Paper Video
Special Talk 2/16 胡威
(UMich)
Hidden Structures in Neural Network Representations Talk Info [1, 2] B站
11/10 陈乐偲
(清华大学)
Near-Optimal Nonconvex-Strongly-Convex Bilevel Optimization with Fully First-Order Oracles Talk Info [1] B站
11/17 张博航
(北京大学)
Towards Revealing the Mystery behind Chain of Thought: A Theoretical Perspective Talk Info [1] B站
11/24 顾欣然
(清华大学)
A Quadratic Synchronization Rule for Distributed Deep Learning Talk Info [1] B站
12/1 石佳欣(DeepMind) MultiresConv: From Wavelet Theory to Long Context Modeling with Neural Networks Talk Info [1] B站
12/8 范凤磊
(香港中文
大学)
In Pursuit of Deciphering ReLU Networks and Beyond Talk Info [1] B站
12/15 NeurIPS break
12/22 刘冰彬
(CMU)
Thinking Fast with Transformers: algorithmic reasoning with shortcuts Talk Info [1] (ICLR 23' oral), [2] (NeurIPS 23' spotlight) B站
12/29 温凯越
(清华大学)
Transformers are uninterpretable with myopic methods: a case study with bounded Dyck grammars Talk Info [1] B站
1/12 游凯超
(清华大学)
Understand, Learn, and Adopt the PyTorch compiler (torch.compile) Talk Info [1, 2, 3] B站

2023 R02
Time Speaker Talk Title Paper Video
(Special)09/15 李志远
(Stanford)
The Generalization Benefit of Flatnes Regularization [1][2] B站
06/23 张博航
(北京大学)
Understanding the Expressivity of Subgraph-based GNNs for Graph Learning [1] B站
06/30 罗胜杰
(北京大学)
One Transformer Can Understand Both 2D & 3D Molecular Data [1] B站
07/07 刘子鸣
(MIT)
Intelligence from hunger [1], [2] B站
07/14 马鉴昊
(UMich)
Robust Sparse Mean Estimation [1] B站
07/21 金及凯
(北京大学)
Minimax optimal operator learning [1] B站
07/28 ICML break
08/04 王博涵
(中国科学
技术大学)
When and Why Momentum Accelerates SGD [1] B站
08/11 滕佳烨
(清华大学)
Predictive inference with feature conformal prediction [1] B站
08/18 蔡天乐
(Princeton)
Large Language Models as Tool Makers [1] B站

2023 R01
Time Speaker Talk Title Paper Video
(Special) 05/26 张景昭
(清华大学)
Two Phases of Scaling Laws for Nearest Neighbor Classifiers [1] B站
03/03 张鼎怀
(Mila)
GFlowNets: Exploration for Probabilistic Inference [1],[2],[3],[4] B站
03/10 顾欣然
(清华大学)
Why (and When) does Local SGD Generalize Better than SGD [1] B站
03/17 王博涵
(中国科学
技术大学)
Provable Benefit of Adaptivity in ADAM [1] B站
03/24 温凯越
(清华大学)
How Does Sharpness-Aware Minimization Minimize Sharpness? [1] B站
03/31 张博航
(北京大学)
Rethinking the Expressive Power of GNNs via Graph Biconnectivity [1] (ICLR 2023 Outstanding Paper) B站
04/07 马鉴昊(UMich) Escaping Saddle Points Or Not? [1], [2] B站
04/14 陈乐偲
(复旦大学)
On Bilevel Optimization without Lower-level Strong Convexity [1] B站
04/21 黄凯旋(Princeton) Score Approximation, Estimation and Distribution Recovery of Diffusion Models on Low-Dimensional Data [1] B站
04/28 戴言
(清华大学)
Variance-Aware Sparse Linear Bandits [1] B站