Summary
We have implemented and optimized a parallel collaborative filtering engine which features:
- A compact data structure and related algorithms that we invented (we have not read about any data compression work in collaborative filtering papers)
- A multi-GPU, matrix-based solution
- Parallel locality sensitive hashing preprocessing algorithm for user clustering
Our deliverables will include:
- Performance graph of multiple algorithms we have implemented.
Here is a link to our final presentation slides:
Background and Approach and Results
We have put up separate writeups for each optimization technique we used. Detailed explanations of the optimization techniques, algorithms, designs, and results can be found in the following posts, please read:
More Results
- We measured the the performance using the
CycleTimer.h
in all previous assignments. - For the matrix solution, the precise setup is in the slides above. For the compressed data structure and nearest neighbor search implementations, they are tested on latedays cluster.
- Graphs can be found in those reports.
- We have datasets from Movielens ranging from 100k to 10m.
- As to what limits our speedup, please read those reports above.
References
- Streaming Similarity Search over one Billion Tweets using Parallel Locality-Sensitive Hashing
- Large-scale Parallel Collaborative Filtering for the Netflix Prize
- Standford Data Mining Textbook
List of work
We have done equal amount of work for this project