[python-package] `params` are lost when copying a `Booster` #5539

hsorsky · 2022-10-12T20:34:30Z

Description

A copied Booster loses it's params attribute

Reproducible example

import copy

import lightgbm
import pandas as pd
from lightgbm.basic import Booster

train_set = pd.DataFrame(
    {
        "feature1": [1, 2, 3],
        "feature2": [1, 2, 1],
    }
)
booster = Booster(
    params={"num_leaves": 5}, 
    train_set=lightgbm.Dataset(data=train_set),
)
print(booster.params)

copied = copy.deepcopy(booster)
print(copied.params)

Environment info

LightGBM version or commit hash: 3.3.2

Command(s) you used to install LightGBM

pip install lightgbm==3.3.2

Additional Comments

Perhaps Booster.model_from_string should set it after initialising the object?

The text was updated successfully, but these errors were encountered:

jameslamb · 2022-10-12T20:54:04Z

Thanks for this report! I agree that it's desirable for the new Booster's .params attribute to match the one it was copied from.

Here's the relevant code run during such a copy.

LightGBM/python-package/lightgbm/basic.py

Lines 2841 to 2847 in 0c0eb2a

    
           def __copy__(self) -> "Booster": 
        
               return self.__deepcopy__(None) 
        
           def __deepcopy__(self, _) -> "Booster": 
        
               model_str = self.model_to_string(num_iteration=-1) 
        
               booster = Booster(model_str=model_str) 
        
               return booster

One approach that might be taken to solve this is to mimic what's done when pickling / unpickling boosters, where every attribute except handle (a pointer to a Booster object on the C++ side) is preserved via Booster.__dict__.

LightGBM/python-package/lightgbm/basic.py

Lines 2849 to 2856 in 0c0eb2a

    
           def __getstate__(self) -> Dict[str, Any]: 
        
               this = self.__dict__.copy() 
        
               handle = this['handle'] 
        
               this.pop('train_set', None) 
        
               this.pop('valid_sets', None) 
        
               if handle is not None: 
        
                   this["handle"] = self.model_to_string(num_iteration=-1) 
        
               return this

But I'd have to look into it more closely...there are several ways to solve this, I think. @hsorsky are you interested in attempting a pull request? If not, no worries and we're thankful for the report...but wanted to give you the first opportunity.

hsorsky · 2022-10-12T22:37:22Z

Yes, I'm interested in attempting, but I might need some guidance.

I assume it's preferable to have Booster.params be set in Booster.model_from_string rather than Booster.__deepcopy__?

If Booster.__deepcopy__ suffices, does something like

    def __deepcopy__(self, _):
        model_str = self.model_to_string(num_iteration=-1)
        booster = Booster(model_str=model_str)
        if not booster.params:
            booster.params = deepcopy(self.params)
        return booster

work?

If not, and it would be required to be in Booster.model_from_string, does something similar to _load_pandas_categorical work but where we'd iterate the lines of model_str from parameters: to end of parameters with custom handling for each param and its expected type (which sounds quite complicated)

jmoralez · 2022-10-13T03:35:27Z

With #5424 merged we can actually get all the params back from the model string, might be something to consider, i.e.

import lightgbm as lgb
import numpy as np

ds = lgb.Dataset(np.random.rand(100, 2))
bst = lgb.train({'num_leaves': 5, 'verbosity': -1}, ds)
bst2 = lgb.Booster(model_str=bst.model_to_string())
print(bst2._get_loaded_param())

returns all the parameters. While working on that I remember at one moment I added it somewhere to model_from_string as well and these lines

LightGBM/python-package/lightgbm/engine.py

Lines 267 to 268 in 0c0eb2a

    
           if not keep_training_booster: 
        
               booster.model_from_string(booster.model_to_string()).free_dataset()

made it so that the booster that is returned from the train function had all the parameters.

jmoralez · 2022-10-13T03:50:54Z

With these changes:

--- a/python-package/lightgbm/basic.py
+++ b/python-package/lightgbm/basic.py
@@ -2802,6 +2802,7 @@ class Booster:
             self.__get_eval_info()
             self.pandas_categorical = train_set.pandas_categorical
             self.train_set_version = train_set.version
+            self.params = params
         elif model_file is not None:
             # Prediction task
             out_num_iterations = ctypes.c_int(0)
@@ -2818,13 +2819,12 @@ class Booster:
             self.pandas_categorical = _load_pandas_categorical(file_name=model_file)
             if params:
                 _log_warning('Ignoring params argument, using parameters from model file.')
-            params = self._get_loaded_param()
+            self.params = self._get_loaded_param()
         elif model_str is not None:
             self.model_from_string(model_str)
         else:
             raise TypeError('Need at least one training dataset or model file or model string '
                             'to create Booster instance')
-        self.params = params
 
     def __del__(self) -> None:
         try:
@@ -3582,6 +3582,7 @@ class Booster:
             ctypes.byref(out_num_class)))
         self.__num_class = out_num_class.value
         self.pandas_categorical = _load_pandas_categorical(model_str=model_str)
+        self.params = self._get_loaded_param()
         return self

We'd get all parameters back from the train function and when copying.

import copy

import lightgbm as lgb
import numpy as np

ds = lgb.Dataset(np.random.rand(100, 2))
bst = lgb.train({'num_leaves': 5, 'verbosity': -1}, ds)
bst2 = copy.deepcopy(bst)
print(len(bst.params))  # 90
print(len(bst2.params))  # 90

jameslamb · 2022-10-13T14:43:07Z

ok yeah nice! At first glance, I like that (@jmoralez 's suggested approach above)

I'm going to be traveling for the next few days and might not be available much. So @hsorsky if you put up a pull request (with tests please 😀 ) I might not be able to look at it until next week, but hopefully @jmoralez will be able to when time permits.

jameslamb · 2023-09-15T02:26:46Z

👋🏻 I'm working on a fix for this right now.

afikbar · 2023-11-02T17:55:48Z

@jameslamb I've just encountered it too, happy to assist with the PR if too busy (:

jameslamb changed the title ~~Booster doesnt have params if copied.~~ [python-package] params are lost when copying a Booster Oct 12, 2022

jameslamb added the bug label Oct 12, 2022

jameslamb added a commit that referenced this issue Sep 15, 2023

[python-package] preserve params when copying Booster (fixes #5539)

5b917e2

jameslamb linked a pull request Sep 15, 2023 that will close this issue

WIP: [python-package] preserve params when copying Booster (fixes #5539) #6101

Draft

jameslamb self-assigned this Sep 15, 2023

jmoralez mentioned this issue Mar 27, 2024

[python-package] Can't retrieve best_iteration after Optuna optimization #6384

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[python-package] `params` are lost when copying a `Booster` #5539

[python-package] `params` are lost when copying a `Booster` #5539

hsorsky commented Oct 12, 2022

jameslamb commented Oct 12, 2022

hsorsky commented Oct 12, 2022 •

edited

Loading

jmoralez commented Oct 13, 2022

jmoralez commented Oct 13, 2022 •

edited

Loading

jameslamb commented Oct 13, 2022 •

edited

Loading

jameslamb commented Sep 15, 2023

afikbar commented Nov 2, 2023

[python-package] params are lost when copying a Booster #5539

[python-package] params are lost when copying a Booster #5539

Comments

hsorsky commented Oct 12, 2022

Description

Reproducible example

Environment info

Additional Comments

jameslamb commented Oct 12, 2022

hsorsky commented Oct 12, 2022 • edited Loading

jmoralez commented Oct 13, 2022

jmoralez commented Oct 13, 2022 • edited Loading

jameslamb commented Oct 13, 2022 • edited Loading

jameslamb commented Sep 15, 2023

afikbar commented Nov 2, 2023

[python-package] `params` are lost when copying a `Booster` #5539

[python-package] `params` are lost when copying a `Booster` #5539

hsorsky commented Oct 12, 2022 •

edited

Loading

jmoralez commented Oct 13, 2022 •

edited

Loading

jameslamb commented Oct 13, 2022 •

edited

Loading