Media Summary: In this video, I'll be deriving and coding FlashAttention is an IO-aware algorithm for computing Several LLMs have used long context: GPT-4 (32k), MosaicML's MPT (65k), Anthropic's Claude (100k). But
Lecture 12 Flash Attention - Detailed Analysis & Overview
In this video, I'll be deriving and coding FlashAttention is an IO-aware algorithm for computing Several LLMs have used long context: GPT-4 (32k), MosaicML's MPT (65k), Anthropic's Claude (100k). But Episode 67 of the Stanford MLSys Seminar “Foundation Models Limited Series”! Speaker: Tri Dao Abstract: Transformers are slow ... ML Performance Reading Group Session 24 meeting recording Paper: Speaker: Charles Frye From the Modal team: