Media Summary: Check out videos from Upperside Conference's recent World Congress (formerly known as MPLS World Congress): ... Todd Muirhead talks with Uday Kurkure and Lan Vu about recent tests of The provided text introduces LLM-D, an open-source project designed to

Uwc26 Optimizing Ai Inference Performance - Detailed Analysis & Overview

Check out videos from Upperside Conference's recent World Congress (formerly known as MPLS World Congress): ... Todd Muirhead talks with Uday Kurkure and Lan Vu about recent tests of The provided text introduces LLM-D, an open-source project designed to Talk : Everything You Need to Know About Reducing Voice-Agent Latency (by Philip Kiely @ Baseten) Rolling your own ... Philip Kiely, Head of Developer Relations at Baseten, presents the “Golden Triangle” of Learn how NVIDIA Dynamo and Kubernetes help scale high-

How do you go from state-of-the-art foundation model to a globally available usage-based API? This session provides an ... Check out complete MWC Barcelona 2026 Showcase at: ## Arrcus Unveils Talk : Introductions and Meetup Updates by Chris Fregly and Antje Barth Talk : Presented by Anton Kachatkou, Principal Software Engineer, Arm Arm NPUs deliver high throughput and efficiency in Master LLM core concepts! Explore MoE, RLHF, DPO alignment, FlashAttention, and LoRA fine-tuning. Learn about KV caching, ... Summary: Victor Moreno, Product Manager for Cloud Networking at Google, discusses the critical role of networking in ...

Photo Gallery

#UWC26: Optimizing AI Inference Performance: Testing Networks at Scale
AI Inference: The Secret to AI's Superpowers
Extreme Performance Series 2026: AI Inference Performance on VCF 9
Databricks & Together AI on Inference, Optimization, & Hardware
LLM-D: Optimizing Distributed AI Inference with Intelligent Routing
Maximize LLM Inference Performance + Auto-Profile/Optimize PyTorch/CUDA Code
The Golden Triangle of Inference Optimization: Balancing Latency, Throughput, and Quality
Scaling AI Inference Performance in the Cloud with Nebius
Faster LLMs: Accelerate Inference with Speculative Decoding
RAG vs Fine-Tuning vs Prompt Engineering: Optimizing AI Models
Deploying scalable and reliable AI inference on Google Cloud
Mastering LLM Inference Optimization From Theory to Cost Effective Deployment: Mark Moyou
Sponsored
Sponsored
View Detailed Profile
Sponsored
Sponsored