Research Papers

  • RecSys’18

    • Rishabh Misra, Mengting Wan, Julian McAuley, “Decomposing Fit Semantics for Product Size Recommendation in Metric Spaces”, in Proc. of 2018 ACM Conference on Recommender Systems (RecSys’18), Vancouver, Canada, Oct. 2018.
    • Paper | Code | Datasets
  • MUSE’15

    • Avijit Saha*, Rishabh Misra*, Balaraman Ravindran, “Scalable Bayesian Matrix Factorization”, In Proceedings of the 6th International Conference on Mining Ubiquitous and Social Environments (MUSE) @ PKDD/ECML, 2015 Sep 7 (pp. 43-54), Porto, Portugal. (* equal contribution)
    • Paper | Code
  • Pre-print

    • Avijit Saha, Rishabh Misra, Ayan Acharya, and Balaraman Ravindran. “Scalable Variational Bayesian Factorization Machine”.
    • Paper | Code


  • Clothing Fit Dataset for Size Recommendation [Released: August 2018]

    • Product size recommendation and fit prediction are critical in order to improve customers’ shopping experiences and to reduce product return rates. However, modeling customers’ fit feedback is challenging due to its subtle semantics, arising from the subjective evaluation of products and imbalanced label distribution (most of the feedbacks are “Fit”). These datasets, which are the only fit related datasets available publically at this time, collected from ModCloth and RentTheRunWay could be used to address these challenges to improve the recommendation process. (100+ downloads on Kaggle)
  • News Headlines Dataset For Sarcasm Detection [Released: June 2018]

    • Past studies in Sarcasm Detection mostly make use of Twitter datasets collected using hashtag based supervision but such datasets are noisy in terms of labels and language. Furthermore, many tweets are replies to other tweets and detecting sarcasm in these requires the availability of contextual tweets. To overcome the limitations related to noise in Twitter datasets, this News Headlines dataset for Sarcasm Detection is collected from two news website. TheOnion aims at producing sarcastic versions of current events and we collected all the headlines from News in Brief and News in Photos categories (which are sarcastic). We collect real (and non-sarcastic) news headlines from HuffPost. (300+ downloads on Kaggle)
  • News Category Dataset [Released: June 2018]

    • This dataset contains around 125k news headlines from the year 2013 to 2018 obtained from HuffPost. This dataset could be used to produce some interesting liguistic insights about the type of language used in different news articles or to simply identify tags for untracked news articles. (700+ downloads on Kaggle)