Mandatory Prerequisites

  • Participants should have basic knowledge of Python, containerized environments, and experience working in Jupyter/Colab or similar notebook workflows.
  • Languages/tools: Python
  • Frameworks: PyTorch, TensorRT-LLM, Triton Inference Server™, SGLang, vLLM
  • Get ready with build.nvidia.com.
  • Bring your laptop to this workshop. Laptop with internet access—Ideal minimum: 5 Mbps download/1–2 Mbps upload. This will ensure consistent access to the lab. 

Get Started With Your AI Inference Journey

Discover how Tech Mahindra is collaborating with NVIDIA to be at the forefront of generative AI innovation.

From developing therapeutic molecules to building India’s sovereign LLM in Hindi and 37+ dialects, Tech Mahindra is utilizing NVIDIA’s hardware and software stack to build the Nemotron-4-Mini-Hindi-4B model. Tech Mahindra’s work on Indus 2.0 with Indonesia on Bahasa is state of the art and built on NVIDIA AI Inference software.

Speakers

Utkarsh Uppal

Utkarsh Uppal

Senior Applied Deep Learning Solutions Architect

NVIDIA

Utkarsh specializes in building high-performance deep learning pipelines across domains like language and speech. His primary focus is on developing end-to-end conversational AI systems, including training large language models (LLMs) from scratch, particularly for Indic languages and also building domain-specific reasoning models with enterprises. He also has deep expertise in designing and optimizing inference architectures for production, with a focus on low-precision formats (FP4, FP8), decoding strategies, and KV-cache optimizations.

Agenda

9:00 a.m.
Registrations and Networking
10:00 a.m.
Welcome and Introduction to the NVIDIA Ecosystem

IISc Bangalore | Megh Makwana

10:30 a.m.
Accelerating LLM Inference With TensorRT and TensorRT-LLM

NVIDIA AI Blueprint: Bring Your LLM to NIM

(Hands-On)

IISc Bangalore | Utkarsh Uppal

11:00 a.m.
Tea/Coffee Break
12:15 p.m.
Accelerating LLM Inference With TensorRT and TensorRT-LLM (Hands-On)

(Hands-On)

IISc Bangalore | Utkarsh Uppal

1:45 p.m.
Lunch Break
2:30 p.m.
Disaggregated Serving Using NVIDIA Dynamo

IISc Bangalore | Utkarsh Uppal

4:00 p.m.
Tea/Coffee Break
4:15 p.m.
Scaling LLM Inference Using DGX Cloud Lepton/NVCF

IISc Bangalore | Megh Makwana

5:00 p.m.
Closing and Networking
Time Zone: (UTC+05:30) Kolkata [Change Time Zone]

Event Details

NVIDIA Hands-On Training on Inference

Friday, November 7, 2025

CV Raman Rd
A.V.R. Auditorium New Chemical Science Building
Bangalore KA 560012
India

Venue

CV Raman Rd
A.V.R. Auditorium New Chemical Science Building
Bangalore KA 560012
India

Additional Resources

NVIDIA AI Enterprise Solutions

Explore the most advanced AI, ready for enterprise. Explore the latest breakthroughs made possible with NVIDIA AI.

Learn More >

NVIDIA AI Inference Solutions

Greater AI performance and compounded returns. Think SMART. Think NVIDIA Inference.

Learn More >

NVIDIA Inference Performance

Inference can be deployed in many ways, depending on the use case. Offline processing of data is best done at larger batch sizes, which can deliver optimal GPU utilization and throughput. Deliver great user experiences by lowering latency.

Learn More >

NVIDIA TensorRT

NVIDIA TensorRT is an ecosystem of tools for developers to achieve high-performance deep learning inference.

Learn More >

NVIDIA TensorRT-LLM

NVIDIA TensorRT LLM is an open-source library built to deliver high-performance, real-time inference optimization for large language models (LLMs) on NVIDIA GPUs—whether on a desktop or in a data center.

Learn More >

NVIDIA Developer Program

Access free tools, extensive learning opportunities, and expert help with the NVIDIA Developer Program.

Learn More >

NVIDIA NIM Microservices

NVIDIA NIM is a set of microservices for deploying AI models. Tap into the latest AI foundation models—like Stable Diffusion, esmfold, and Llama 3—with downloadable NIM microservices for your application deployment.

Learn More >

NVIDIA Run:ai Tech Blog

Cut model deployment costs while keeping performance with GPU memory swap.

Learn More

Large Language Models

Large language models (LLMs) are deep learning algorithms that can recognize, summarize, translate, predict, and generate content using very large datasets.

Learn More >

NGC Containers

Phind-CodeLlama-34B-v2-Instruct All you need to build AI—GPU-optimized containers, pretrained models, SDKs, and Helm charts—unified in one catalog for cloud, data-center, or edge

Learn More >

NGC Containers

Llama-3.1-Nemotron-70B-Instruct All you need to build AI—GPU-optimized containers, pretrained models, SDKs, and Helm charts—unified in one catalog for cloud, data-center, or edge

Learn More >

NGC Containers

Llama-3-Taiwan-70B-Instruct All you need to build AI—GPU-optimized containers, pretrained models, SDKs, and Helm charts—unified in one catalog for cloud, data-center, or edge

Learn More >

NVOD LLM Inference Benchmarking

LLM inference benchmarking end-to-end inference systems. Learn how to choose the right path for your AI initiatives by understanding the key metrics in large language model (LLM) inference sizing. Watch the video.

Learn More >

Technical Blog

Large language models (LLMs) are deep learning algorithms that can recognize, summarize, translate, predict, and generate content using very large datasets.

Learn More >