
Transformers are Deep Optimizers: Provable In-Context Learning for DeepModel Training
This paper investigates the transformer’s capability for in-context learning (ICL) to simulate the training process of deep models, providing a provable explicit construction.