This movielens project is for the online Harvard Data Science Capstone course. 10M movielens data set is used to develop a regression algorithm to optimize the rating prediction RMSE.
movielensscript.R - R script for optimizing the final prediction RMSE. It returns the predicted movie rating and final RMSE score. NOTE: This script requires >64GB of RAM to run.
movielensreport.Rmd - movielens report in Rmd format. It provides project overview, data analysis, model approach, result, and conclusions. It also has an appendix for all helper functions, all data visualization R code and many relevant codes. NOTE: The results can be generated using the file movielensscript.R. In this file, the final results are manually entered to speed up the process.
movielensreport.pdf - movielens report in PDF format. It contains the rendered figure, table, text, and formulae of movielensreport.Rmd without the code chunks.