Insights & Perspectives

Blog & Articles

Deep dives into Generative AI, Recommender Systems, LLM Inference Optimization, and Machine Learning Architecture.

May 02, 2026
Advanced LLM Inference Optimization Techniques
An overview of advanced LLM inference optimization techniques, detailing how model parallelism, Grouped-Query Attention, and specialized serving engines overcome the Memory Wall.
Read Article
April 15, 2026
From AI Enthusiast to Inference Architect: Mastering GPU Profiling
Stop obsessing over prompts and start profiling GPUs. Master Arithmetic Intensity, KV Caching, and Scheduling to transition from an AI Enthusiast to an Inference Architect.
Read Article
February 10, 2026
Speculative Decoding: The Mathematics of Acceptance Rate
Speculative Decoding can make LLMs 3x faster—or slow them down. Master the mathematics of the Acceptance Rate to determine when and how to deploy this powerful optimization technique.
Read Article
January 15, 2026
Quantization Tradeoffs in LLMs: The Unpacking Tax
Quantizing an LLM to 4-bit doesn't guarantee faster latency. Understand the hidden 'Unpacking Tax' and learn how batch sizes dictate whether quantization helps or hurts your performance.
Read Article
November 20, 2025
You Don't Need More H100s: Understanding PagedAttention
Stop buying more GPUs. Learn how PagedAttention and vLLM decouple physical from logical memory to eliminate fragmentation and unlock continuous batching for massive throughput gains.
Read Article
October 05, 2025
KV Cache: The Silent Killer of Your Inference Budget
The KV Cache is the hidden memory tax causing your LLM applications to crash. Discover the math behind the bottleneck and how fragmentation eats your inference budget.
Read Article
August 10, 2025
Why Your Team Built the Wrong Recommender
Traditional Retrieve-and-Rank models treat user history like a grocery list. Learn why the industry is shifting toward 'Encode and Generate' paradigms to capture temporal user intent.
Read Article
July 20, 2025
The Visual Cold-Start Trap in Recommender Systems
Uncover the 'Visual Cold-Start Trap' in recommendation systems and learn how to use Contrastive Learning and Cross-Attention to ensure new products get the visibility they deserve.
Read Article
May 15, 2025
The LLM-ERS Pattern: Where LLMs Actually Belong in RecSys
Why using massive LLMs for real-time ranking fails in production, and how the LLM-ERS pattern offers a scalable blueprint by keeping retrieval fast and utilizing LLMs for offline augmentation.
Read Article
April 10, 2025
Building LLM-Powered Recommendation Systems: My LinkedIn Learning Course
An exclusive sneak peek into my upcoming LinkedIn Learning course, which explores the paradigm shift of integrating Large Language Models into massive-scale personalization pipelines.
Read Article
March 20, 2025
Unlock Hyper-Personalized Advertising with Generative AI
Explore how Generative AI is moving digital marketing from broad segmentation to hyper-personalization by dynamically synthesizing visual and textual ad creatives in real-time.
Read Article
February 10, 2025
What is Generative AI? Peeling the magic behind the buzz
A technical primer on Generative AI, breaking down the mechanics of Transformers and Diffusion models, and exploring how self-supervised learning is transforming the industry.
Read Article
January 15, 2025
Common Pitfalls in ML Interviews and How to Avoid Them
Discover the five critical mistakes candidates make in Machine Learning interviews—from neglecting fundamentals to ignoring production reality—and learn how to overcome them.
Read Article
January 01, 2019
Machine Learning Glossary
A comprehensive, easy-to-digest glossary designed to quickly clarify popular concepts, evaluation metrics, and algorithms encountered in Machine Learning interviews and literature.
Read Article
December 01, 2018
Would this clothing product fit me?
An explanation of my RecSys paper addressing the challenge of providing accurate, personalized fit guidance in online fashion retail using advanced machine learning models.
Read Article
December 07, 2017
Inference using EM algorithm
A deep dive into the mathematical derivation of the Expectation-Maximization (EM) algorithm, demonstrating how to tackle inference problems involving latent variables.
Read Article
October 18, 2017
Maximum Likelihood Estimates - Motivation for EM algorithm
Understand the core premise of Maximum Likelihood Estimates in Logistic Regression, and how this foundational concept motivates the powerful Expectation-Maximization (EM) algorithm.
Read Article
July 04, 2017
Introduction to Support Vector Machines - Soft Margin Formulation and Kernel Trick
The second part of the SVM series, explaining how the Soft Margin formulation and the Kernel Trick allow models to elegantly handle noisy and linearly inseparable datasets.
Read Article
June 28, 2017
Introduction to Support Vector Machines - Motivation and Basics
A foundational introduction to Support Vector Machines (SVM), covering the geometric intuition behind finding the optimal margin classifier for linearly separable data.
Read Article
December 30, 2016
Text Prediction - Behind the Scenes
An inside look at how mobile keyboard applications use language models and smoothing techniques to predict your next word with increasing accuracy.
Read Article