Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Full dataset normalization biases test set #1238

Open
theely opened this issue Jun 1, 2024 · 1 comment
Open

Full dataset normalization biases test set #1238

theely opened this issue Jun 1, 2024 · 1 comment

Comments

@theely
Copy link

theely commented Jun 1, 2024

In FinRL_PortfolioOptimizationEnv_Demo.ipynb all data used for training and testing are normalised at once:

portfolio_norm_df = GroupByScaler(by="tic", scaler=MaxAbsScaler).fit_transform(portfolio_raw_df)

By doing so we tell the model when across all our observations (training + test) the stock price of a given position in the portfolio has been the highest.

@C4i0kun should we not instead normalize only the batch of observations for a given time step? Avoiding to introduce forward looking information, that is supposed to be unknown.
For example, we could delegate the normalisation to env_portfolio_optimization.py:

# define data to be used in this time step
self._data_raw = self._df[
    (self._df[self._time_column] >= start_time)
    & (self._df[self._time_column] <= end_time)
][[self._time_column, self._tic_column] + self._features]

# normalise observations
self._data = GroupByScaler(by="tic", scaler=MaxAbsScaler, columns=['close','high','low']).fit_transform(self._data_raw)
@BruceYanghy
Copy link
Member

Thank you for bringing up the issue. Currently, the FinRL library is extremely poorly maintained. Rest assured, I will reorganize a team to ensure its proper maintenance.

Best regards,

Bruce Yang

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants