add intra_distance_score evaluation #103

mokarakaya · 2018-03-16T23:27:39Z

Issue #90 (Diversification metrics for evaluation)

Intra_distance diversity is probably mostly considered metric of diversity.
Therefore I'd like to add it first.

maciejkula · 2018-03-17T17:57:56Z

Thank you for this!

I think to move this forward, we'll want to do two things.

Can we add some references to the metric in the docstring? Maybe some papers that use it?
We'll need to make this fast. From a cursory glance at the code, I suspect it's incredibly slow.

On making this fast: as far as I understand, we want to compute the cosine distance between the columns i and j of the (sparse) interaction matrix. There are a couple of things that we can do to make this faster than the current implementation:

We convert the interactions object to a sparse matrix, transpose it, and make it CSR. This way, rows represent items. Call this mat.
We get the lengths of the item vectors by calling lenghts = mat.getnnz(axis=1) (I think, the axis argument always throws me).
We can get the length of the intersection of i and j by doing

numerator = np.in1d(mat[i].indices, mat[j].indices, assume_unique=True).sum()
denominator = lengths[i] * lengths[j]
distance = numerator / denominator

This way we don't need the cache either.

Let's return this in a (num_users, k * (k-1) / 2) array (it's a list of lists right now).

We can make it even faster by using the fact that indices are sorted and using numba, but this is probably a good first step.

maciejkula

Thanks for the changes, I made some additional comments.

More generally:

Can you give me a sense of how slow/fast this is? Could you post how long it takes for the Movielens 100k dataset, for instance?
We will need a version of this for sequence-based models. Have a look at the MRR routines to see how the two differ.

maciejkula · 2018-04-03T12:55:51Z

spotlight/evaluation.py

+
+    distances = []
+    test = test.tocsr()
+    mat = train.tocoo().T.tocsr()


Is the coo conversion necessary?

maciejkula · 2018-04-03T12:58:17Z

spotlight/evaluation.py

+    """
+
+    distances = []
+    test = test.tocsr()


What do we need test for? For knowing which users to compute the predictions for?

Maybe a cleaner way of doing the same would be to allow the user to pass in an optional array of user ids for which the metric should be computed.

maciejkula · 2018-04-03T12:58:59Z

spotlight/evaluation.py

+    for user_id, row in enumerate(test):
+        predictions = -model.predict(user_id)
+        rec_list = predictions.argsort()[:k]
+        distance = [


I personally find nested list comprehensions very confusing. Could we use nested for loops here?

mokarakaya · 2018-04-03T21:54:00Z

Thank you very much for the comments. I agree.
I'll check and update accordingly.

In addition to these comments, I'm still looking for a way to move distance function to input parameters. The current function will be the default one since it's really fast (I'll post exact times separately)

… intra_distance

mokarakaya · 2018-09-30T15:20:57Z

I've fixed the review comments except for the sequence-based models solution.

calling intra_distance_score function in tests takes 6 seconds (only the function) when I run locally.

I should check how we can achieve to run this on sequence-based models;
We need to convert the sequence to items array with user_ids in order to compute the distance between items.

So we need a matrix like this to calculate the distance;
[[userId1, userId2, userId3],
[userId2, userId3]]

where each row represents an item.

Any advise or guidance would be greatly appreciated.

mokarakaya added 3 commits March 17, 2018 00:19

add intra_distance_score evaluation

bc960a1

fix styling issues.

797c79d

fix styling

2ca8b43

mokarakaya added 4 commits March 23, 2018 00:27

apply review comments for intra_distance_score

b7118a5

remove unused imports

4148715

revert changes in test_precision_recall

a32d76e

styling

316f101

maciejkula requested changes Apr 3, 2018

View reviewed changes

mokarakaya added 6 commits September 5, 2018 22:44

fix review comments.

1c35e6d

Merge branch 'master' of https://github.com/maciejkula/spotlight into…

448f891

… intra_distance

fix review comments for intra_distance

2cab76f

fix travis styling build

a62fc2c

fix documentation (test is replaced with user_ids)

3a36f76

intra_distance - calculate lengths only once

6896c20

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

add intra_distance_score evaluation #103

add intra_distance_score evaluation #103

mokarakaya commented Mar 16, 2018

maciejkula commented Mar 17, 2018 •

edited

Loading

maciejkula left a comment

maciejkula Apr 3, 2018 •

edited

Loading

maciejkula Apr 3, 2018

maciejkula Apr 3, 2018

mokarakaya commented Apr 3, 2018

mokarakaya commented Sep 30, 2018

add intra_distance_score evaluation #103

Are you sure you want to change the base?

add intra_distance_score evaluation #103

Conversation

mokarakaya commented Mar 16, 2018

maciejkula commented Mar 17, 2018 • edited Loading

maciejkula left a comment

Choose a reason for hiding this comment

maciejkula Apr 3, 2018 • edited Loading

Choose a reason for hiding this comment

maciejkula Apr 3, 2018

Choose a reason for hiding this comment

maciejkula Apr 3, 2018

Choose a reason for hiding this comment

mokarakaya commented Apr 3, 2018

mokarakaya commented Sep 30, 2018

maciejkula commented Mar 17, 2018 •

edited

Loading

maciejkula Apr 3, 2018 •

edited

Loading