Handling constant subsequence #11

oentaryorj · 2018-12-13T05:58:00Z

In some occasion, a time series subsequence can be flat/constant for a while, which may lead to a zero standard deviation for that subsequence. However, the the current codes does not seem to handle this case yet. Is there a way to fix this?

vanbenschoten · 2018-12-13T18:22:46Z

@oentaryorj This is a great call-out. There are a couple of options:

We could remove flat/constant segments as a pre-processing routine in the library
We could report these segments back as motifs (though admittedly this is a "trivial" motif that may not be meaningful).

Since it sounds like you've encountered this situation in practice, what are your thoughts?

aouyang1 · 2018-12-29T00:56:38Z

@oentaryorj Something that's worked for me was to add some very small noise to the signal. This has the added benefit that the flat regions can be detected later on as different segments of the time series.

oentaryorj · 2019-01-02T06:18:36Z

@vanbenschoten: Sorry for the late response - just back from vacation :) I would be more inclined toward option 2, although we could probably put an option to ignore/skip the flat segments.

@aouyang1: Thanks for the suggestion. Do you need to modify the current codes to make this work? Any other numerical issues encountered?

ameya98 · 2019-06-18T00:58:29Z

If you read the original SAX paper (SAX was invented by the same people who invented the Matrix Profile) you will see that they handle subsequences with very low standard deviation (that includes the constant subsequence, as that has standard deviation zero) by not dividing by the standard deviation when normalizing. (The threshold for the standard deviation is a parameter here.)
I think incorporating that into the matrix profile makes the most sense.

aouyang1 · 2019-06-18T02:03:03Z

Agreed. Found this from stack overflow as well https://github.com/scikit-learn/scikit-learn/blob/7389dbac82d362f296dc2746f10e43ffa1615660/sklearn/preprocessing/data.py#L70 treating the standard deviation as 1 keeping the existing subsequence values and not normalizing.

Since we're trying to compare to subsequences together such that one subsequence's spectral power does not bias over another, would it make sense to divide the signal such that we satisfy parseval's theorem?

oentaryorj · 2019-06-18T02:22:21Z

Thanks for the inputs. So I guess "ignoring" the standard deviation means option 2 in @vanbenschoten's reply (i.e., treating constant subsequence as "trivial" motif)? This makes sense to me, although it may depend on the application context.

ameya98 · 2019-06-18T03:04:15Z

Performing the mean subtraction but not the division by the standard deviation in the distance profile computation should suffice as well?

JaKasb mentioned this issue Jan 14, 2019

Tests for Negative Standard Deviation #22

Closed

peterdhansen mentioned this issue Aug 13, 2019

Strange behaviour when testing with constant values #71

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Handling constant subsequence #11

Handling constant subsequence #11

oentaryorj commented Dec 13, 2018

vanbenschoten commented Dec 13, 2018

aouyang1 commented Dec 29, 2018

oentaryorj commented Jan 2, 2019

ameya98 commented Jun 18, 2019

aouyang1 commented Jun 18, 2019

oentaryorj commented Jun 18, 2019 •

edited

Loading

ameya98 commented Jun 18, 2019 •

edited

Loading

Handling constant subsequence #11

Handling constant subsequence #11

Comments

oentaryorj commented Dec 13, 2018

vanbenschoten commented Dec 13, 2018

aouyang1 commented Dec 29, 2018

oentaryorj commented Jan 2, 2019

ameya98 commented Jun 18, 2019

aouyang1 commented Jun 18, 2019

oentaryorj commented Jun 18, 2019 • edited Loading

ameya98 commented Jun 18, 2019 • edited Loading

oentaryorj commented Jun 18, 2019 •

edited

Loading

ameya98 commented Jun 18, 2019 •

edited

Loading