Accelerating and Scaling LLMs With NVIDIA

Bharat Giddwani

Senior Solutions Architect

NVIDIA

Bharat is a seasoned senior solutions architect specializing in enterprise-scale generative AI solutions, with deep expertise in large language models (LLMs), multimodal AI, and retrieval-augmented generation (RAG) optimizations. His proficiency lies in designing and implementing robust, secure AI architectures that deliver measurable business impact. His technical prowess extends to advanced LLM techniques, including inference and training optimization. His solutions emphasize production readiness, incorporating robust monitoring and security controls, enabling organizations such as cloud providers, ISVs, and enterprises to successfully navigate their AI transformation journey.

9:00 a.m.

Registrations and Networking

10:00 a.m.

Welcome and Introduction to the NVIDIA Ecosystem

NSUT, Dwarka, Delhi | Anish Mukherjee

10:15 a.m.

Introduction to NSUT and available facilities

NSUT, Dwarka, Delhi | Prof. Anand Srivastava, Vice Chancellor

10:30 a.m.

Accelerating LLM Inference with TensorRT and TensorRT-LLM

NVIDIA AI Blueprint: Bring Your LLM to NIM

(Hands-On)

NSUT, Dwarka, Delhi | Anish Mukherjee and Bharat Giddwani

11:00 a.m.

Tea/Coffee Break

12:15 p.m.

Accelerating LLM Inference With TensorRT and TensorRT-LLM

(Hands-On)

NSUT, Dwarka, Delhi | Bharat Giddwani and Anish Mukherjee

1:45 p.m.

Lunch Break

2:30 p.m.

Disaggregated Serving Using NVIDIA Dynamo

NSUT, Dwarka, Delhi | Anish Mukherjee and Bharat Giddwani

4:00 p.m.

Tea/Coffee Break

4:15 p.m.

Scaling LLM Inference Using DGX Cloud Lepton/NVCF

NSUT, Dwarka, Delh i | Anish Mukherjee and Bharat Giddwani

5:00 p.m.

Closing and Networking

Time Zone: (UTC+05:30) Kolkata []

NVIDIA AI Enterprise Solutions	Explore the most advanced AI, ready for enterprise. Explore the latest breakthroughs made possible with NVIDIA AI.	Learn More >
NVIDIA AI Inference Solutions	Greater AI performance, compounded returns. Think SMART. Think NVIDIA Inference.	Learn More >
NVIDIA Inference Performance	Inference can be deployed in many ways, depending on the use case. Offline processing of data is best done at larger batch sizes, which can deliver optimal GPU utilization and throughput. Deliver great user experiences by lowering latency.	Learn More >
NVIDIA TensorRT	NVIDIA TensorRT is an ecosystem of tools for developers to achieve high-performance deep learning inference.	Learn More >
NVIDIA TensorRT-LLM	NVIDIA TensorRT LLM is an open-source library built to deliver high-performance, real-time inference optimization for large language models (LLMs) on NVIDIA GPUs—whether on a desktop or in a data center.	Learn More >
NVIDIA Developer Program	Access free tools, extensive learning opportunities, and expert help with the NVIDIA Developer Program.	Learn More >
NVIDIA NIM Microservices	NVIDIA NIM™ is a set of microservices for deploying AI models. Tap into the latest AI foundation models—like Stable Diffusion, ESMFold, and Llama 3—with downloadable NIM microservices for your application deployment.	Learn More >
NVIDIA Run:ai Tech Blog	Cut model deployment costs while keeping performance with GPU memory swap.	Learn More >
Large Language Models	Large language models (LLMs) are deep learning algorithms that can recognize, summarize, translate, predict, and generate content using very large datasets.	Learn More >
NGC Containers	Phind-CodeLlama-34B-v2-Instruct All you need to build AI—GPU-optimized containers, pretrained models, SDKs, and Helm charts—unified in one catalog for cloud, data center, or edge.	Learn More >
NGC Containers	Llama-3.1-Nemotron-70B-Instruct All you need to build AI—GPU-optimized containers, pretrained models, SDKs, and Helm charts—unified in one catalog for cloud, data center, or edge.	Learn More >
NGC Containers	Llama-3-Taiwan-70B-Instruct All you need to build AI—GPU-optimized containers, pretrained models, SDKs, and Helm charts—unified in one catalog for cloud, data center, or edge.	Learn More >
NVOD—LLM Inference Benchmarking	LLM inference benchmarking end-to-end inference systems. Learn how to choose the right path for your AI initiatives by understanding the key metrics in large language model (LLM) inference sizing. Watch the video.	Learn More >
Technical Blog	Large language models (LLMs) are deep learning algorithms that can recognize, summarize, translate, predict, and generate content using very large datasets.	Learn More >

Accelerating and Scaling LLMs With NVIDIA - Delhi

Speakers

Bharat Giddwani

Agenda

Event Details

Venue

Additional Resources