[Feature Request] Auto early stopping in Sklearn API #3313

rohan-gt · 2020-08-18T08:21:16Z

Is it possible to perform early stopping using cross-validation or automatically sampling data from the provided train set without explicitly specifying an eval set?

guolinke · 2020-08-18T09:10:41Z

lgb.cv supports early stopping, does it meet your request?

rohan-gt · 2020-08-18T11:24:23Z

@guolinke I was actually looking for the same feature within the Sklearn API. Changed the title now

kmedved · 2020-08-21T14:44:06Z

This is how sklearn's HistGradientBoostingClassifier performs early stopping (by sampling the training data). There are significant benefits to this in terms of compatibility with the rest of the sklearn ecosystem, since most sklearn tools don't allow for passing validation data, or early stopping rounds.

Enabling this sort of functionality would allow a significant speedup in hyperparameter searching by taking advantage of both of sklearn's cross_val_score or RandomizedSearchCV, which are efficiently multiprocessed and can evaluate either multiple sets of parameters at once, or multiple folds at once. This scales better for many datasets than throwing more cores at LightGBM directly.

Ideally this would be implemented as an option of course, and not replace the existing behavior of course.

jameslamb · 2020-08-21T15:37:24Z

For your consideration, we did have a discussion about this with the scikit-learn maintainers in #2270. Using early stopping with a random subset of the data (not a validation set you create yourself) can lead to misleading results, because of information leaking from the training data to the validation data.

That being said...I personally favor adding automatic early stopping to the scikit-learn interface specifically, even if that means that we use train_test_split() like they do and set some early_stopping_rounds to pass through to LightGBM. The goal of the scikit-learn API is to allow people who are using scikit-learn to plug in LightGBM as a possible model in things like GridSearchCV. Even if we disagree with the decision that scikit-learn made about early stopping for HistGradientBoostingClassifier, now that that decision has been made I think that LightGBM's scikit-learn interface should adapt to it.

But I am not a Python maintainer here, so will defer to @guolinke and others.

kmedved · 2020-08-21T16:10:50Z

Thanks @jameslamb - that's helpful background, and I see the concerns (especially since you can't pass a cv object into HistGradientBoostingClassifier, so are at the mercy of train_test_split).

I would find this functionality helpful despite these drawbacks, but it is obviously not essential.

rohan-gt · 2020-11-09T19:26:16Z

@guolinke is it possible to add this functionality like @jameslamb mentioned?

StrikerRUS · 2021-01-12T21:51:09Z

now that that decision has been made I think that LightGBM's scikit-learn interface should adapt to it.

But please note that things might change:

Nothing really defined yet, but we're actually trying to go in the reverse direction. ... Basically, we're trying to move any parameter that is data-specific into fit, or at least out of __init__. Though again, nothing definite for now.
#2966 (comment)

I expect some changes in the sklearn public API in the (near) future.

StrikerRUS · 2021-01-12T21:52:02Z

Closed in favor of being in #2302. We decided to keep all feature requests in one place.

Welcome to contribute this feature! Please re-open this issue (or post a comment if you are not a topic starter) if you are actively working on implementing this feature.

ClaudioSalvatoreArcidiacono · 2023-03-26T18:07:22Z

I have been working on this feature lately, it would be great if someone could review it :)
Here is the PR Link

lorenzwalthert · 2023-05-10T08:18:14Z

Timely PR, was looking for exactly this feature 😄 . @ClaudioSalvatoreArcidiacono seems like your PR passes all CI but is blocked for reviewing until you sign your commits the way the maintainers expect it. Would be great if this PR could pass the finish line before new merge conflicts arrive.

github-actions · 2023-08-15T20:13:13Z

This issue has been automatically locked since there has not been any recent activity since it was closed.
To start a new related discussion, open a new issue at https://github.com/microsoft/LightGBM/issues
including a reference to this.

jameslamb · 2023-08-18T02:16:36Z

Sorry, this was locked accidentally. Just unlocked it.

rohan-gt changed the title ~~[Feature Request] Auto early stopping~~ [Feature Request] Auto early stopping in Sklearn API Aug 19, 2020

jameslamb added the feature request label Aug 22, 2020

hcho3 mentioned this issue Aug 23, 2020

[Feature Request] Auto early stopping in Sklearn API dmlc/xgboost#6026

Open

StrikerRUS added the help wanted label Jan 12, 2021

StrikerRUS closed this as completed Jan 12, 2021

StrikerRUS mentioned this issue Jan 12, 2021

Feature Requests & Voting Hub #2302

Open

jmoralez mentioned this issue Mar 31, 2022

Allow early stopping in Sklearn Pipeline that has a custom transformer #5090

Open

c60evaporator mentioned this issue May 3, 2022

Cross validation with early stopping, dynamic eval_set c60evaporator/tune-easy#2

Open

jmoralez mentioned this issue Jun 16, 2022

MultiOutputClassifier can not work with early_stopping_round? #5277

Closed

ClaudioSalvatoreArcidiacono mentioned this issue Mar 26, 2023

[python-package] enable early stopping automatically in scikit-learn interface (fixes #3313) #5808

Open

34j mentioned this issue May 18, 2023

[python-package] Add progress_bar callback using tqdm #5867

Closed

github-actions bot locked as resolved and limited conversation to collaborators Aug 15, 2023

microsoft unlocked this conversation Aug 18, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Feature Request] Auto early stopping in Sklearn API #3313

[Feature Request] Auto early stopping in Sklearn API #3313

rohan-gt commented Aug 18, 2020

guolinke commented Aug 18, 2020

rohan-gt commented Aug 18, 2020 •

edited

Loading

kmedved commented Aug 21, 2020

jameslamb commented Aug 21, 2020

kmedved commented Aug 21, 2020 •

edited

Loading

rohan-gt commented Nov 9, 2020

StrikerRUS commented Jan 12, 2021

StrikerRUS commented Jan 12, 2021

ClaudioSalvatoreArcidiacono commented Mar 26, 2023 •

edited

Loading

lorenzwalthert commented May 10, 2023

github-actions bot commented Aug 15, 2023

jameslamb commented Aug 18, 2023

[Feature Request] Auto early stopping in Sklearn API #3313

[Feature Request] Auto early stopping in Sklearn API #3313

Comments

rohan-gt commented Aug 18, 2020

guolinke commented Aug 18, 2020

rohan-gt commented Aug 18, 2020 • edited Loading

kmedved commented Aug 21, 2020

jameslamb commented Aug 21, 2020

kmedved commented Aug 21, 2020 • edited Loading

rohan-gt commented Nov 9, 2020

StrikerRUS commented Jan 12, 2021

StrikerRUS commented Jan 12, 2021

ClaudioSalvatoreArcidiacono commented Mar 26, 2023 • edited Loading

lorenzwalthert commented May 10, 2023

github-actions bot commented Aug 15, 2023

jameslamb commented Aug 18, 2023

rohan-gt commented Aug 18, 2020 •

edited

Loading

kmedved commented Aug 21, 2020 •

edited

Loading

ClaudioSalvatoreArcidiacono commented Mar 26, 2023 •

edited

Loading