Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WIP: Unify dump_model output (fixes #5962) #5964

Draft
wants to merge 7 commits into
base: master
Choose a base branch
from

Conversation

thatlittleboy
Copy link

@thatlittleboy thatlittleboy commented Jul 9, 2023

Closes #5962.

Note: PR is not ready for review yet. Still iterating on this, but I'm open to suggestions at this stage if any. Thank you!

Changes

  • Added leaf_count to the string output of Tree::ToJSON() when num_leaves=1.

@thatlittleboy
Copy link
Author

@microsoft-github-policy-service agree

@jameslamb jameslamb changed the title Unify dump_model output WIP: Unify dump_model output Jul 9, 2023
@jameslamb jameslamb changed the title WIP: Unify dump_model output WIP: Unify dump_model output (fixes #5962) Jul 9, 2023
@thatlittleboy
Copy link
Author

Hi @jameslamb , I gave it a shot with the changes in this PR, but after re-installing and running pytest

sh build-python.sh install
pytest tests/python_package_test/test_engine.py::test_dump_model

I was still getting segmentation fault's, and I'm not quite sure how to debug. Swapping out leaf_count_[0] in the str_buf call for a literal number like 777 works, so the error is definitely arising because of leaf_count_[0].

For a stump tree, is leaf_count_ or internal_count_ undefined?

@jameslamb
Copy link
Collaborator

For a stump tree, is leaf_count_ or internal_count_ undefined?

I'm traveling this week and it may be a while until I'm able to look into this and help you, sorry. Maybe try tracing through the code with something like git grep leaf_count_ to find where it is set.

@thatlittleboy
Copy link
Author

thatlittleboy commented Jul 15, 2023

Update

An update: I've fixed the segmentation fault, but the leaf_count_ output is currently 0, whereas I was expecting it to be the number of samples that was used to train the booster.

Will try to look into this.


Notes for myself

Code tracing for the dump_model() function

bst.dump_model()
-> python-package/lightgbm/basic.py :: L3984  [definition of `dump_model`]
-> python-package/lightgbm/basic.py :: L4027  [call into `_safe_call(_LIB.LGBM_BoosterDumpModel`]
-> src/c_api.cpp :: L2491  [definition of `LGBM_BoosterDumpModel`, which does `ref_booster->DumpModel` internally]
-> src/boosting/gbdt_model_text.cpp :: L21  [definition of `DumpModel`]
-> src/boosting/gbdt_model_text.cpp :: L96  [which calls models_[i]->ToJSON()]

@jameslamb
Copy link
Collaborator

I just merged the latest changes from master into this. I'm ready to try to help... are you still interested in pursuing this, @thatlittleboy ?

@thatlittleboy
Copy link
Author

@jameslamb Yep I'm interested still to follow up on this, but havent' had much progress since the last update.

TBH it might take a couple of weekends for me to get around to this -- I'll need to re-review where I got stuck and ask more specific questions here (haven't managed to find the few hours needed to sit down and concentrate on this ; been quite busy lately).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Stump has no leaf_count inside dump_model() output
2 participants