Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Start from 0.14.0, unwanted fork can happened when importing statsmodels #9289

Open
WindSoilder opened this issue Jun 21, 2024 · 12 comments · May be fixed by #9291
Open

Start from 0.14.0, unwanted fork can happened when importing statsmodels #9289

WindSoilder opened this issue Jun 21, 2024 · 12 comments · May be fixed by #9291

Comments

@WindSoilder
Copy link

Describe the bug

When importing statsmodels, a system call fork is called, but I don't think it's necessry.

I'm using statsmodels with numpy 1.26.1 version, but I think I can verify that we have the same issue when using numpy 2.0.

Here is relative backtrace:

   import statsmodels.api as sm
  File "<frozen importlib._bootstrap_external>", line 850, in exec_module
  File "<frozen importlib._bootstrap>", line 228, in _call_with_frames_removed
  File "/usr/local/lib/python3.9/site-packages/statsmodels/__init__.py", line 1, in <module>
    from statsmodels.compat.patsy import monkey_patch_cat_dtype
  File "/usr/local/lib/python3.9/site-packages/statsmodels/compat/__init__.py", line 1, in <module>
    from statsmodels.tools._testing import PytestTester
  File "<frozen importlib._bootstrap>", line 228, in _call_with_frames_removed
  File "/usr/local/lib/python3.9/site-packages/statsmodels/tools/__init__.py", line 2, in <module>
    from statsmodels.tools._testing import PytestTester
  File "/usr/local/lib/python3.9/site-packages/statsmodels/tools/_testing.py", line 17, in <module>
    from numpy.testing import assert_allclose, assert_
  File "/usr/local/lib/python3.9/site-packages/numpy/testing/__init__.py", line 11, in <module>
    from ._private.utils import *
  File "/usr/local/lib/python3.9/site-packages/numpy/testing/_private/utils.py", line 1253, in <module>
    _SUPPORTS_SVE = check_support_sve()
  File "/usr/local/lib/python3.9/site-packages/numpy/testing/_private/utils.py", line 1247, in check_support_sve
    output = subprocess.run(cmd, capture_output=True, text=True)

The issue is that when import it, it runs this:

from numpy.testing import assert_allclose, assert_

Then in numpy, it will run something like fork in the end.

Code Sample, a copy-pastable example if possible

import statsmodels.api as sm

If the issue has not been resolved, please file it in the issue tracker.

Expected Output

I expected it doesn't call fork.

Output of import statsmodels.api as sm; sm.show_versions()

INSTALLED VERSIONS

Python: 3.9.7.final.0
OS: Darwin 22.6.0 Darwin Kernel Version 22.6.0: Mon Apr 22 20:54:28 PDT 2024; root:xnu-8796.141.3.705.2~1/RELEASE_X86_64 x86_64
byteorder: little
LC_ALL: None
LANG: None

statsmodels

Installed: 0.14.2 (/local/lib/python3.9/site-packages/statsmodels)

Required Dependencies

cython: 0.29.26 (/local/lib/python3.9/site-packages/Cython)
numpy: 1.26.4 (/local/lib/python3.9/site-packages/numpy)
scipy: 1.13.1 (/local/lib/python3.9/site-packages/scipy)
pandas: 1.5.3 (/local/lib/python3.9/site-packages/pandas)
dateutil: 2.8.2 (/local/lib/python3.9/site-packages/dateutil)
patsy: 0.5.6 (/local/lib/python3.9/site-packages/patsy)

Optional Dependencies

matplotlib: 3.6.3 (/local/lib/python3.9/site-packages/matplotlib)
backend: MacOSX
cvxopt: Not installed
joblib: Not installed

Developer Tools

IPython: 8.18.1 (/local/lib/python3.9/site-packages/IPython)
jinja2: 3.0.1 (/local/lib/python3.9/site-packages/jinja2)
sphinx: Not installed
pygments: 2.14.0 (/local/lib/python3.9/site-packages/pygments)
pytest: 6.2.4 (/local/lib/python3.9/site-packages/pytest)
virtualenv: Not installed

@josef-pkt
Copy link
Member

Is there anything specific to statsmodels or is this just numpy behavior that we cannot control?

@bashtage
Copy link
Member

This is a numpy issue. It uses subprocess to detect certain capabilities which is causing problems for you.

Are you running in some nonstandard environment (e.g., a compiled executable?)

@WindSoilder
Copy link
Author

Is there anything specific to statsmodels or is this just numpy behavior that we cannot control?

I'm not pretty sure, because before 0.14.0, when I run import statsmodel, it didn't run something like this:

from statsmodels.compat.patsy import monkey_patch_cat_dtype

Which makes numpy to check_support_sve

Are you running in some nonstandard environment (e.g., a compiled executable?)

Nope, I'm just running a regular python script.

@bashtage
Copy link
Member

Yes, the monkey patch is new and due to breaks in some versions of pandas. What version of statsmodels are you using? What NumPy are you using?

@WindSoilder
Copy link
Author

I'm using statsmodels 0.14.2 and numpy 1.26.4.

Sorry I think I have post these information in the description, which is outputed by import statsmodels.api as sm; sm.show_versions()

@bashtage
Copy link
Member

I can't reproduce. On ubuntu:

sudo apt update\nsudo apt install software-properties-common
sudo add-apt-repository ppa:deadsnakes/ppa
sudo apt update
sudo apt install python3.9 python3.9-dev
virtualenv sm-test --python=python3.9
source sm-test/bin/activate
python -m pip install "statsmodels==0.14.2" "numpy==1.26.4"
python -c "import statsmodels.api as sm; print('Working')"

Seems to be something specific to your install. Are you running Linux, Windows or OSX?

@bashtage
Copy link
Member

Can you post the full traceback with the error details?

@WindSoilder
Copy link
Author

Oh, I'm sorry. The trackback is printed by myself. import statsmodels.api as sm doesn't raise such exception.

The point of my issue is that when import statsmodels, it shouldn't execute this statement:

from statsmodels.tools._testing import PytestTester

Because it will activate a fork() system call.
It is a waste of resource. And some tcp relative libraries may recreate a connection after fork().

@josef-pkt
Copy link
Member

import statsmodels.api should trigger import statsmodels.__init__
so it will also import the pytester.
something sounds strange here.

checking some details.

statsmodels.__init__ uses a lazy import of Pytester in a test function just to avoid the unconditional import.
However, statsmodels.compat.__init__ is called to have the monkeypatch, and that module does not have a lazy import.

We could make the import also lazy in statsmodels.compat.__init__.
However, what's the point? Any usage of statsmodels (except for the plain import statsmodels) will trigger the Pytester import. It's supposed to be in every __init__ of a subdirectory and will be imported before importing any useful function or class.

@bashtage bashtage linked a pull request Jun 25, 2024 that will close this issue
4 tasks
@bashtage
Copy link
Member

There is an easy fix. There is some testing code in tools that loads NumPy testing assert classes. Easy to refactor away from these imports.

@josef-pkt
Copy link
Member

Is it only the numpy.testing asserts that cause the forking?

new PR looks good to me

@WindSoilder
Copy link
Author

However, what's the point? Any usage of statsmodels (except for the plain import statsmodels) will trigger the Pytester import. It's supposed to be in every init of a subdirectory and will be imported before importing any useful function or class.

Yeah, I agree your point. To my point, I just don't want to trigger fork because of importing statsmodels, especially when I run it as a service, when I start some service instance, they suddendly fork many times, it can make my machine have large overload at a time.
I can lazy import statsmodels to get rid of such behavior, but I think it's good to resolve it here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants