Accelerating and Scaling LLMs With NVIDIA

This immersive workshop is designed to help developers, data scientists, and AI practitioners learn how to accelerate, scale, and customize large language model (LLM) inference using NVIDIA’s full-stack AI ecosystem.

Across the technical deep-dive sessions, participants will explore how NVIDIA AI infrastructure and software (NVIDIA® TensorRT™, TensorRT-LLM, NVIDIA Dynamo, NVIDIA Run:ai, Kubernetes) combine to deliver high-performance, production-grade LLM applications. The workshop also features a dedicated session on developing Indic LLMs, enabling developers to leverage the latest breakthroughs in locally relevant language models.

Each module includes hands-on coding exercises, live demos, and real-world examples, ensuring that participants not only learn the theory but also gain deployable skills in scaling AI workloads for enterprise and research needs.

Accelerate LLM Inference: Learn how to reduce inference latency and improve throughput using TensorRT and TensorRT-LLM, NVIDIA’s cutting-edge optimizations for large models.
Scale Seamlessly: Gain practical experience in orchestrating LLM workloads across GPUs, clusters, and hybrid cloud setups using NVIDIA Dynamo, NVIDIA Run:ai, and Kubernetes.
Build Indic AI Models: Get hands-on with Indic LLM development, leveraging the latest regional models to power local language applications.
Industry-Relevant Use Cases: Explore how Banking Financial Services and Insurance (BFSI), Global Capability Centers (GCCs), and Global Research Centers (GRCs) are adopting LLM-driven applications such as conversational AI, compliance automation, and generative analytics.
End-to-End Practical Skills: From optimization → scaling → deployment, participants will walk away with workflows they can immediately apply to their organization’s AI projects.

Accelerating and Scaling LLMs With NVIDIA

NVIDIA AI Inference Workshop

Pick Your Workshop