Scaling benchmarks #8

benmccann · 2016-09-13T23:02:01Z

I've been looking at Spark implementations of Factorization Machines. I found that none of the existing open source implementations scale to a dataset with millions of features and hundreds of millions of examples. I'd be curious how this implementation is able to scale.

geffy · 2016-09-14T09:30:35Z

Hi @benmccann, I believe you should check https://github.com/dmlc/difacto -- from my point of view, it is the most scalable solution. Btw, FFM (https://www.csie.ntu.edu.tw/~cjlin/libffm/) is a good pure C++ implementation which I've been able to run on my laptop on dataset with ~10k features (25 non-zeros) and ~30kk samples

tffm is mostly for research purpose, so I don't expect really good scalability

kopopt · 2016-09-28T21:51:27Z

@geffy @benmccann These days I was learning tensorflow, and developed a distributed factorization machine version. I customized some operators such that it has comparable performance with difacto. Welcome to take a look and give some suggestion :) Thanks.

https://github.com/kopopt/fast_tffm

arita37 · 2017-11-11T13:55:36Z

I might be able to test;
just need to convert crteo into tffm input format. Is there any reference for the input format ?

geffy added the help wanted label Oct 7, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Scaling benchmarks #8

Scaling benchmarks #8

benmccann commented Sep 13, 2016

geffy commented Sep 14, 2016

kopopt commented Sep 28, 2016

arita37 commented Nov 11, 2017

Scaling benchmarks #8

Scaling benchmarks #8

Comments

benmccann commented Sep 13, 2016

geffy commented Sep 14, 2016

kopopt commented Sep 28, 2016

arita37 commented Nov 11, 2017