[Pycon] [new paper] "Cenk Bircanoğlu" - Training of Hybrid Recommender for Product Recommendation

Lun 8 Gen 2018 15:32:48 CET

Title: Training of Hybrid Recommender for Product Recommendation
Duration: 60 (includes Q&A)
Q&A Session: 0
Language: en
Type: Talk

Abstract: Boyner Group is the biggest non-food retail company in Turkey. We have millions of products that contain differences in terms of brand and segment and so on. We have developed a recommendation engine using Spark in order to make customer-based product recommendations. Conducive to create this recommendation engine, a hybrid model implemented based on CrossCoOccurrence model.

User, product, and event (transaction, view and basket) datasets are used in training section which is stored in HBase. Via Spark, datasets are loaded from HBase, recommendation model trained and the results of the model stored to Elasticsearch. Then, results in Elasticsearch queried according to user history and user current event data. 

Core, SQL and Ml libraries of Spark own the biggest roles in model training. Mostly, machine learning algorithms of Spark ML library applied or altered to build a hybrid recommendation engine. To make our application flexible, adaptable and extendable, we tried to extend Pipeline, Estimator, Transformer concepts of Spark ML library. However, the implementation of custom estimators, pipelines, and transformers was done due to the fact that original interfaces were not sufficient to accept multiple dataset that we need to use in our algorithms. 

New estimators implemented for matrix factorization, clustering and classification, XMeans model, CrossCoOccurence model, custom clustering, and grouping algorithms.  Also, new transformers also implemented with custom udf’s to change user birth date to age and so on.  Apart from these, a cosine similarity calculation was performed between product features in order to find similar products. Similarly, feature-based similarities of users are calculated. Also, the content based filtering algorithm was created by using the similarities of the product images. Finally, the process of finding popular, hot and trend products is calculated via Spark SQL and ML.

We are running this Spark implementation in production for 3 months succesffully.

All the implementations are made in Python and PySpark

Tags: [u'Recommendation', u'distributed-systems', u'productivity', u'spark']