Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Strided dataset feature branch #162

Open
wants to merge 10 commits into
base: develop
Choose a base branch
from

Conversation

Diego-Llanes
Copy link
Collaborator

@Diego-Llanes Diego-Llanes commented Jun 11, 2024

This pull request contains the StridedDataset feature addition. The only file that have additions / changes is src/neuromancer/dataset.py.
All of the class parameters are well documented and have docstrings.

EDIT: There is now also an example documenting the usecase of this new feature.

Contributors:

>
>
Co-authored-by: Seth Lindberg Briney [email protected]
Co-authored-by: Harry Qiang [email protected]
>
>
Co-authored-by: Seth Lindberg Briney [email protected]
Co-authored-by: Harry Qiang [email protected]
…ure_branch' into StridedDataset_feature_branch
>
>
Co-authored-by: Seth Lindberg Briney [email protected]
Co-authored-by: Harry Qiang [email protected]
>
>
Co-authored-by: Seth Lindberg Briney [email protected]
Co-authored-by: Harry Qiang [email protected]
Merged develop into local fork
>
>
Co-authored-by: Seth Lindberg Briney [email protected]
Co-authored-by: Harry Qiang [email protected]
@madelynshapiro
Copy link
Collaborator

@Diego-Llanes would you please upload a small example (can be a modified snippet from one of your full notebooks) demonstrating the usage to accompany the PR?

@Diego-Llanes
Copy link
Collaborator Author

Diego-Llanes commented Jun 13, 2024

@Diego-Llanes would you please upload a small example (can be a modified snippet from one of your full notebooks) demonstrating the usage to accompany the PR?

Just added!

@Diego-Llanes Diego-Llanes added the enhancement New feature or request label Jun 13, 2024
Copy link
Collaborator

@madelynshapiro madelynshapiro left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Excellent notebook!

batch['name'] = self.name
return batch

def remainder(self, n, d):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
def remainder(self, n, d):
@staticmethod
def remainder(n, d):

Copy link
Contributor

@drgona drgona left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The notebook should be modified:

  • the first image in the notebook is inconsistent with parameters in the text - create a new image illustrating parameters N, L, and stride
  • include training example of simple system ID model demonstrating the use of the new dataset
  • include more visualizations of how the data is processed into subsequences - it is not clear what I am looking at in the last plot, add more explanation, always label axes in the plots

@RBirmiwal
Copy link
Collaborator

strided_dataset_rahul_test.ipynb.zip

I agree with Jan.
In addition have attempted to play around with the strided dataset to ensure it works. I have created a notebook zipped up to this comment. I am still confused at times:

  • In DictDataset, let's say D:= a dictDataset, i can do D['X'] --> 3D tensor
  • In StridedDataset S:= StridedDataset, I cannot do S['X'], i have to do S[idx]['X] --> 2D tensors

While the user can figure out how to do proper reshaping and data creation, this is a difference in our API

  • what if instead we had S['X'] --> list of subsequence tensors for X
  • Also the tensor X is 2D not 3D......

I attempted to do Neural ODE example with StridedDataset. For L >= nsteps looks like it trains, but the performance is not as good as standard DictDataset.

For L < nsteps, it breaks.

I like this functionality, but we need to ensure it works and performs on-par for all the neuromancer use-cases not just Farama DPC. If there are situations where the StridedDataset will fail (as indicated when L > nsteps), then those need to be handled appropriately.

@RBirmiwal
Copy link
Collaborator

Also would like an example of a non-trivial update_fn, right now it is

def update_initial_condition(d):
    d['xn_2'] = d["xn"][0:1, :]
    return d

for the case of the neural ode where our keys are "X" and "xn". I don't understand how "xn_2" would play a role/being used. So what are cases where this would be necessary?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants