Metric Learning for Classification

Introduction

In machine learning, models typically fall into two categories: learning based and similarity based. Learning based models have certain parameter which are tuned on training data to minimize a certain loss using an appropriate optimization method. Whereas, similarity based models produce predictions on unseen data instances based on their similarity (or closeness) to already seen data instances. They are robust when there is label noise in the dataset (that is, correct labels could themseleves be wrong). However, success of how such methods perform depends primarily on how we quantify the similarity/closeness between two data instances and to some extent on how we represent data instances. Typical distance measures like Euclidean distance might not be able to capture the notion of similarity in different scenarios. In this post, the agenda is to discuss Metric Learning technique, which is a combination of learning based and similarity based technique, that aims at automatically learning a Mahanolobis distance metric under which data instances of the same class are more similar as compared to instances of other class.

Read More

Inference using EM algorithm

Introduction

In the previous post, we learnt about scenarios where Expectation Maximization (EM) algorthm could be useful and a basic outline of the algorithm for inferring the model parameters. If you haven’t already, I would encourage you to read that first so that you have the necessary context. In this post, we would dive deeper into the understanding the algorithm. First, we would try to understand how EM algorithm optimizes the log-likelihood at every step. Although, a bit mathematical, this would in-turn help us in understanding how we can use various approximation methods for inference when the E-step (calculating posterior of hidden variables given observed variables and parameters) is not tractable. Disclaimer: This post is a bit Mathematical.

Read More

Maximum Likelihood Estimates - Motivation for EM algorithm

Introduction

To solve any data science problem, first we obtain a dataset, do exploration on it and then, guided by the findings, we try to come up with a model to tackle the problem. Once all of that is done, our next task is to find a way to estimate the parameters of the model based on the dataset we have, so that we can make predictions on unseen data. In this post, we will learn about how we can learn the parameters of the model using Maximum Likelihood approach which has a very simple premise: find parameters that maximize the likelihood of the observed data. Through that, I would motivate the Expectation-Maximization (EM) algorithm which is considered to be an important tool in statistical analysis. This post would assume familiarity with Logistic Regression.

Read More

Introduction to Support Vector Machines - Motivation and Basics

Introduction

In this post, you will learn about the basics of Support Vector Machines (SVM), which is a well-regarded supervised machine learning algorithm. This technique needs to be in everyone’s tool-bag especially people who aspire to be a data scientist one day. Since there’s a lot to learn about, I’ll introduce SVM to you across two posts so that you can have a coffee break in between :)

Read More

Text Prediction - Behind the Scenes

Introduction

These days, one of the common features of a good keyboard application is the prediction of upcoming words. These predictions get better and better as you use the application, thus saving users’ effort. Another application for text prediction is in Search Engines. Predictive search saves effort and guides visitors to results, rather than having them type searches that are slightly off and don’t return a large number of results. As a consumer of these applications, I am sure you would have wondered “How exactly does this prediction works?” at least once. Well, wonder no more because, in this article, I will give you some insight into what goes behind the scenes of producing predictions. So, let’s get started.

Read More