As you may or may not know, or if you've perused this website a little, you might get a feeling that I like to read. Reading helped me through some of the toughest periods of my life and I feel like it can help a lot of other people to theirs as well. Although just starting out, this project does strike very close to the heart.
‘It was books that taught me that the things that tormented me most were the very things that connected me with all the people who were alive, who had ever been alive.’ - James Baldwin
*This dataset entitled Book Recommendation Dataset was created by Möbius on Kaggle.
Recommendation engines are information filtering systems, usually associated with machine learning algorithms that uses data to suggest or recommend additional products or services to consumers. These systems are often based on a variety of variables such as past purchases, search histories, demographic information etc. A classic example is a movie recommendation engine used by streaming services like Netflix or Prime video that gives the customer suggestions on what to watch next depending on what they've watched in the past or how old they are, or what other people have watched etc.
The aim of this project is to create an effective recommendation engine that makes news book suggestions given a current book. There will be three different approaches to this recommendation engine. The first is a user based collaborative filtering approach that uses cosine similarity to calculate the similarity of book preferences with regard to historical user rating data. The second is a slight variation on the first implementation that seeks to have this recommendation more scalable as more data is accumulated, it seeks to do this by using Locality Sensitive Hashing to search with sub-linear times. The third uses Word2Vec as a content-based approach by assigning book titles to specific genres and then performing cosine similarity on the book title plus genre vectors.
After selecting a book, the user-based recommendation algorithm sought to examine the relationship between users, people with similar tastes and recommend relevant books. The user-based recommendation engine focused on both users that had a reasonably significant number of reviews posted along with books that had numerous user reviews.
The content-based recommendation used the information given by the book title to assign to a relevant genre and then sought to recommend based on textual information in both the title and the genre of the book. The content-based approach maybe limited by the lack of relevant book content data in the dataset as it did not have pre-determined labels or any other content-based data that would be useful in a content-based recommendation engine. In the future, it may be possible with more relevant content data and compute power to develop a far superior content-based recommendation engine.