Throughout the rapid growth of Machine Learning and Data Science these years, data is always the key foundation for almost any downstream research, analysis, or intelligent product feature development. One may easily notice that numerous books and courses exist nowadays about helping people manage the skills of consuming the data; however, there are very few resources talking about how to carefully collect, process, and curate high quality datasets. I used to work with Rishabh Misra on several research projects at UC San Diego and have learned many practical data collection and processing skills from him. Therefore I am so excited to hear that Jigyasa and Rishabh are willing to share their knowledge in this domain, and really appreciate their efforts on this book.
The book introduces critical data collection, extraction, preparation, and processing skills. It also provides several Machine Learning application examples and approaches the data problems from the application-oriented perspective. I personally find this book can be very helpful for researchers and practitioners, in order to remove their data availability obstacles, help them proactively but responsibly gather the data they need, and understand the strengths as well as limitations of their datasets. In this regard, I think the book will be ideal as a starting point for data enthusiasts who are willing to learn the dataset collection process from scratch.
Mengting Wan
Senior Applied Scientist, Microsoft