Transformers are Deep Optimizers: Provable In-Context Learning for DeepModel Training

Weimin Wu; Maojiang Su; Jerry Yao-Chieh Hu; Zhao Song; Han Liu

Transformers are Deep Optimizers: Provable In-Context Learning for DeepModel Training

May 2025 · 1 min · Weimin Wu, Maojiang Su, Jerry Yao-Chieh Hu, Zhao Song, Han Liu · Published in The International Conference on Machine Learning, 2025.

Download

Paper

Abstract

We investigate the transformer’s capability for incontext learning (ICL) to simulate the training process of deep models. Our key contribution is providing a positive example of using a transformer to train a deep neural network by gradient descent in an implicit fashion via ICL.

Download

Abstract

Figure 1: Transformers are Deep Optimizers: Provable In-Context Learning for DeepModel Training