Insights & Perspectives
Blog & Articles
Deep dives into Generative AI, Recommender Systems, LLM Inference Optimization, and Machine Learning Architecture.
Advanced LLM Inference Optimization Techniques
An overview of advanced LLM inference optimization techniques, detailing how model parallelism, Grouped-Query Attention, and specialized serving engines overcome the Memory Wall.
Read Article
From AI Enthusiast to Inference Architect: Mastering GPU Profiling
Stop obsessing over prompts and start profiling GPUs. Master Arithmetic Intensity, KV Caching, and Scheduling to transition from an AI Enthusiast to an Inference Architect.
Read Article
Speculative Decoding: The Mathematics of Acceptance Rate
Speculative Decoding can make LLMs 3x faster—or slow them down. Master the mathematics of the Acceptance Rate to determine when and how to deploy this powerful optimization technique.
Read Article
Quantization Tradeoffs in LLMs: The Unpacking Tax
Quantizing an LLM to 4-bit doesn't guarantee faster latency. Understand the hidden 'Unpacking Tax' and learn how batch sizes dictate whether quantization helps or hurts your performance.
Read Article
You Don't Need More H100s: Understanding PagedAttention
Stop buying more GPUs. Learn how PagedAttention and vLLM decouple physical from logical memory to eliminate fragmentation and unlock continuous batching for massive throughput gains.
Read Article
KV Cache: The Silent Killer of Your Inference Budget
The KV Cache is the hidden memory tax causing your LLM applications to crash. Discover the math behind the bottleneck and how fragmentation eats your inference budget.
Read Article
Why Your Team Built the Wrong Recommender
Traditional Retrieve-and-Rank models treat user history like a grocery list. Learn why the industry is shifting toward 'Encode and Generate' paradigms to capture temporal user intent.
Read Article
The Visual Cold-Start Trap in Recommender Systems
Uncover the 'Visual Cold-Start Trap' in recommendation systems and learn how to use Contrastive Learning and Cross-Attention to ensure new products get the visibility they deserve.
Read Article
The LLM-ERS Pattern: Where LLMs Actually Belong in RecSys
Why using massive LLMs for real-time ranking fails in production, and how the LLM-ERS pattern offers a scalable blueprint by keeping retrieval fast and utilizing LLMs for offline augmentation.
Read Article
Building LLM-Powered Recommendation Systems: My LinkedIn Learning Course
An exclusive sneak peek into my upcoming LinkedIn Learning course, which explores the paradigm shift of integrating Large Language Models into massive-scale personalization pipelines.
Read Article
Unlock Hyper-Personalized Advertising with Generative AI
Explore how Generative AI is moving digital marketing from broad segmentation to hyper-personalization by dynamically synthesizing visual and textual ad creatives in real-time.
Read Article
What is Generative AI? Peeling the magic behind the buzz
A technical primer on Generative AI, breaking down the mechanics of Transformers and Diffusion models, and exploring how self-supervised learning is transforming the industry.
Read Article
Common Pitfalls in ML Interviews and How to Avoid Them
Discover the five critical mistakes candidates make in Machine Learning interviews—from neglecting fundamentals to ignoring production reality—and learn how to overcome them.
Read Article
Machine Learning Glossary
A comprehensive, easy-to-digest glossary designed to quickly clarify popular concepts, evaluation metrics, and algorithms encountered in Machine Learning interviews and literature.
Read Article
Would this clothing product fit me?
An explanation of my RecSys paper addressing the challenge of providing accurate, personalized fit guidance in online fashion retail using advanced machine learning models.
Read Article
Inference using EM algorithm
A deep dive into the mathematical derivation of the Expectation-Maximization (EM) algorithm, demonstrating how to tackle inference problems involving latent variables.
Read Article
Maximum Likelihood Estimates - Motivation for EM algorithm
Understand the core premise of Maximum Likelihood Estimates in Logistic Regression, and how this foundational concept motivates the powerful Expectation-Maximization (EM) algorithm.
Read Article
Introduction to Support Vector Machines - Soft Margin Formulation and Kernel Trick
The second part of the SVM series, explaining how the Soft Margin formulation and the Kernel Trick allow models to elegantly handle noisy and linearly inseparable datasets.
Read Article
Introduction to Support Vector Machines - Motivation and Basics
A foundational introduction to Support Vector Machines (SVM), covering the geometric intuition behind finding the optimal margin classifier for linearly separable data.
Read Article
Text Prediction - Behind the Scenes
An inside look at how mobile keyboard applications use language models and smoothing techniques to predict your next word with increasing accuracy.
Read Article