Lecture 13 Efficient Llm Inference

May 25, 2026

Media Summary: Intro to Modern AI online course. For more information and to enroll, please visit This video is the recording of the presentation delivered by me on 28th February on the topic of " For more information about Stanford's online Artificial Intelligence programs visit: To learn more about ...

Lecture 13 Efficient Llm Inference - Detailed Analysis & Overview

Intro to Modern AI online course. For more information and to enroll, please visit This video is the recording of the presentation delivered by me on 28th February on the topic of " For more information about Stanford's online Artificial Intelligence programs visit: To learn more about ... ... vLLM Semantic Router project creator - vLLM Semantic Router: Intelligent Auto Reasoning Router for ChatGPT and similar conversational tools have become remarkably well at having conversations and answering questions. Open-source LLMs are great for conversational applications, but they can be difficult to scale in production and deliver latency ...

In this AI Research Roundup episode, Alex discusses the paper: 'Llamas on the Web: Memory- In this Advancing AI 2024 Luminary Developer Keynote, Dr. Lianmin Zheng introduces SGLang, a high-performance serving ... Unpacks the complexities of Large Language Models. Episode 1 introduces foundational concepts like tokens, embeddings, and ... Speaker: Maksim Khadkevich, Sr. Software Engineering Manager, Dynamo, NVIDIA Khadkevich discusses data center scale ... For more information about Stanford's online Artificial Intelligence programs visit: This In this AI Research Roundup episode, Alex discusses the paper: 'Adaptively Robust

For more information about Stanford's graduate programs, visit: October 10, 2025 ...