Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Scaling benchmarks #8

Open
benmccann opened this issue Sep 13, 2016 · 3 comments
Open

Scaling benchmarks #8

benmccann opened this issue Sep 13, 2016 · 3 comments

Comments

@benmccann
Copy link

I've been looking at Spark implementations of Factorization Machines. I found that none of the existing open source implementations scale to a dataset with millions of features and hundreds of millions of examples. I'd be curious how this implementation is able to scale.

@geffy
Copy link
Owner

geffy commented Sep 14, 2016

Hi @benmccann, I believe you should check https://github.com/dmlc/difacto -- from my point of view, it is the most scalable solution. Btw, FFM (https://www.csie.ntu.edu.tw/~cjlin/libffm/) is a good pure C++ implementation which I've been able to run on my laptop on dataset with ~10k features (25 non-zeros) and ~30kk samples

tffm is mostly for research purpose, so I don't expect really good scalability

@kopopt
Copy link

kopopt commented Sep 28, 2016

@geffy @benmccann These days I was learning tensorflow, and developed a distributed factorization machine version. I customized some operators such that it has comparable performance with difacto. Welcome to take a look and give some suggestion :) Thanks.

https://github.com/kopopt/fast_tffm

@arita37
Copy link

arita37 commented Nov 11, 2017

I might be able to test;
just need to convert crteo into tffm input format. Is there any reference for the input format ?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants