Foreword by Laurence Moroney

This is an important book!

Data is the lifeblood of any Machine Learning or AI solution, and there is only so far you can go with publicly available datasets. What excites me about this book is that Jigyasa and Rishabh go beyond these, and teach you how to create, curate, and manage data effectively.

They will take you through a number of scenarios where they got real-world data from varied sources like online retail and news aggregator websites, but, instead of a rough copy-and-paste, they will instead demonstrate the pipeline involved in making the dataset eminently usable.

Chapter 3 of this book is especially powerful, where you’ll see how, from first principles, to go through the processes of data trimming, anonymization, standardization, transformation, and balancing. Chapter 4 will take you through the important task of feature engineering, where, instead of just throwing raw data at the problem, you can refine and improve it with clipping, scaling, bucketization, and a lot more.

All of this will prepare you for Machine Learning with your own custom data that you have sourced, cleaned, and managed for optimal model creation.

I am really excited by this field, and delighted that a book like this one exists. Pick it up, read, learn, enjoy!

Laurence Moroney
Lead Artificial Intelligence Advocate, Google