NVIDIA AI Inference Workshop

This immersive workshop is designed to help developers, data scientists, and AI practitioners learn how to accelerate, scale, and customize large language model (LLM) inference using NVIDIA’s full-stack AI ecosystem.

Across the technical deep-dive sessions, participants will explore how NVIDIA AI infrastructure and software (NVIDIA® TensorRT™, TensorRT-LLM, NVIDIA Dynamo, NVIDIA Run:ai, Kubernetes) combine to deliver high-performance, production-grade LLM applications. The workshop also features a dedicated session on developing Indic LLMs, enabling developers to leverage the latest breakthroughs in locally relevant language models.

Each module includes hands-on coding exercises, live demos, and real-world examples, ensuring that participants not only learn the theory but also gain deployable skills in scaling AI workloads for enterprise and research needs.

Who Should Attend

  • Developers and ML Engineers working on optimizing and deploying large-scale AI workloads.
  • Data Scientists seeking to accelerate inference pipelines and integrate LLMs into enterprise workflows.
  • Enterprise IT/AI Teams (Enterprises, GCC, GRCs) exploring scalable solutions for running generative AI workloads.
  • Indian LLM Developers and Researchers interested in fine-tuning and deploying Indic language models with NVIDIA’s ecosystem.

Key Learnings 

  • Accelerate LLM Inference: Learn how to reduce inference latency and improve throughput using TensorRT and TensorRT-LLM, NVIDIA’s cutting-edge optimizations for large models.
  • Scale Seamlessly: Gain practical experience in orchestrating LLM workloads across GPUs, clusters, and hybrid cloud setups using NVIDIA Dynamo, NVIDIA Run:ai, and Kubernetes.
  • Build Indic AI Models: Get hands-on with Indic LLM development, leveraging the latest regional models to power local language applications.
  • Industry-Relevant Use Cases: Explore how Banking Financial Services and Insurance (BFSI), Global Capability Centers (GCCs), and Global Research Centers (GRCs) are adopting LLM-driven applications such as conversational AI, compliance automation, and generative analytics.
  • End-to-End Practical Skills: From optimization → scaling → deployment, participants will walk away with workflows they can immediately apply to their organization’s AI projects.

Learning Objectives

  • Understand NVIDIA’s end-to-end AI ecosystem (hardware, software, frameworks).
  • Optimize and accelerate LLM inference with TensorRT/TensorRT-LLM.
  • Deploy and scale inference pipelines across clusters using Dynamo, NVIDIA Run:ai, and K8s.
  • Implement workflows for Indic LLM fine-tuning and deployment.

 

 

 

 

 

 

Pick Your Workshop