Scale Forem

Scale YouTube
Scale YouTube

Posted on

NDC Conferences: Stop Treating LLMs Like REST APIs - Jeff Fran & Jack Pearce - NDC London 2026

Your LLM projects might hum along in testing but totally tank in production because, surprise, LLMs are not your average stateless REST APIs! They're hungry, stateful beasts that gulp down GPU memory and cause mayhem with context, batching, and caching.

But fear not! This talk offers a lifeline: LLM-D's open-source sharding combined with a clever NVIDIA/AMD node pool. Get ready for live demos, handy YAML, and even a secret sauce to keep your token costs from going supernova.

Watch on YouTube

Top comments (0)