Greg Yang The Unreasonable Effectiveness

May 25, 2026

Media Summary: 23 March 2023 Abstract: Recently, the theory of infinite-width neural networks led to the first technology, muTransfer, for tuning ... DISCUSSION MEETING DATA SCIENCE: PROBABILISTIC AND OPTIMIZATION METHODS ORGANIZERS: Vivek Borkar (IIT ... ORGANIZERS: Vivek Borkar (IIT Bombay, India), Sandeep Juneja (TIFR, India), Praneeth Netrapalli (Google Research India) and ...

Greg Yang The Unreasonable Effectiveness - Detailed Analysis & Overview

23 March 2023 Abstract: Recently, the theory of infinite-width neural networks led to the first technology, muTransfer, for tuning ... DISCUSSION MEETING DATA SCIENCE: PROBABILISTIC AND OPTIMIZATION METHODS ORGANIZERS: Vivek Borkar (IIT ... ORGANIZERS: Vivek Borkar (IIT Bombay, India), Sandeep Juneja (TIFR, India), Praneeth Netrapalli (Google Research India) and ... Professor Andrew Briggs continued the Eugene P. Wigner Distinguished Lecture Series in Science, Technology and Policy on ... How can one tune the hyperparameters of an enormous neural network like GPT-3 on a single GPU? **Like, subscribe, and share ... [LAFI'23] Introduction to the tensor-programs framework, a PL approach that helps analyse theoretical properties of deep learning.

ABSTRACT: You can't train GPT-3 on a single GPU, much less tune its hyperparameters (HPs)…or so it seems. I'm here to tell you ... Stanford physics PhD candidate Natalie Paquette studies the interactions between physics and pure mathematics. In this talk, she ... Jay comments on one of his favorites machine learning articles which helped him break into NLP. This popular article ... Session 5: Foundational Aspects of General Intelligence and AI Title: The