Full dataset normalization biases test set #1238

theely · 2024-06-01T12:19:32Z

In FinRL_PortfolioOptimizationEnv_Demo.ipynb all data used for training and testing are normalised at once:

portfolio_norm_df = GroupByScaler(by="tic", scaler=MaxAbsScaler).fit_transform(portfolio_raw_df)

By doing so we tell the model when across all our observations (training + test) the stock price of a given position in the portfolio has been the highest.

@C4i0kun should we not instead normalize only the batch of observations for a given time step? Avoiding to introduce forward looking information, that is supposed to be unknown.
For example, we could delegate the normalisation to env_portfolio_optimization.py:

# define data to be used in this time step
self._data_raw = self._df[
    (self._df[self._time_column] >= start_time)
    & (self._df[self._time_column] <= end_time)
][[self._time_column, self._tic_column] + self._features]

# normalise observations
self._data = GroupByScaler(by="tic", scaler=MaxAbsScaler, columns=['close','high','low']).fit_transform(self._data_raw)

The text was updated successfully, but these errors were encountered:

BruceYanghy · 2024-06-24T13:22:06Z

Thank you for bringing up the issue. Currently, the FinRL library is extremely poorly maintained. Rest assured, I will reorganize a team to ensure its proper maintenance.

Best regards,

Bruce Yang

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Full dataset normalization biases test set #1238

Full dataset normalization biases test set #1238

theely commented Jun 1, 2024 •

edited

Loading

BruceYanghy commented Jun 24, 2024

Full dataset normalization biases test set #1238

Full dataset normalization biases test set #1238

Comments

theely commented Jun 1, 2024 • edited Loading

BruceYanghy commented Jun 24, 2024

theely commented Jun 1, 2024 •

edited

Loading