Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support Python 3.13 #2003

Draft
wants to merge 7 commits into
base: master
Choose a base branch
from
Draft

Conversation

PeterJCLaw
Copy link
Collaborator

@PeterJCLaw PeterJCLaw commented Jun 23, 2024

Various fixes to get the tests passing on Python 3.13, all within the tests themselves.
Also some small tidyups in one of the tests.
Moves to the 3.13 grammar from parso 0.8.4 and add testing on 3.13.

Reviewing by commit may be useful.

Fixes #2002.

Discovered issues:

  • pathlib has been refactored, there's now pathlib._local and pathlib._abc, which breaks passing pickled objects to subprocesses with different Python versions

This should make it easier to add new entries as well as clarifying
the intent of this filter.
In Python 3.13 the `locals` function now returns a fresh mapping
each time it's called (when called in a function). We thus need
to store a reference to the mapping being used, rather than
re-fetching it each time.

Since we don't actually need to modify the locals within the scope
of the test function itself, it suffices to use our own mapping
here rather than the result of calling `locals`, which fully
isolates this test from the nature of that function.

Fixes davidhalter#2002
This moves to using the 3.13 grammar as well as testing 3.13 in CI.
@davidhalter
Copy link
Owner

Thanks a lot for looking into this! Please ping me once you are finished. I'm also happy to help with issues. It feels like you are almost done, though.

@PeterJCLaw
Copy link
Collaborator Author

I'm also happy to help with issues. It feels like you are almost done, though.

Heh, the "obvious" things are fixed. I don't immediately have great ideas for the pathlib thing though. I also haven't looked at why the test_imports.py tests are failing. This could be the same thing or they might be something else.

An idea I have around the pickles issue, but which I don't like, is that we could adjust the binary content of the pickle itself, to rename the path. That can be made to work (and I believe it could be done in a way that's reliable), but it doesn't feel right. It feels like a better solution would be to hook the pickling logic somehow, but I don't know if there are hook points which would enable that?

Jedi passes pickles to subprocesses which are running the target
version of Python and thus may not be the same as the version
under which Jedi itself is running. In Python 3.13, pathlib is
being refactored to allow for easier extension and has thus moved
most of its internal implementation to a submodule. Unfortunately
this changes the paths of the symbols, causing pickles of those
types to fail to load in earlier versions of Python.

This commit introduces a custom unpickler which accounts for this
move, allowing bi-directional passing of pickles to work.
@PeterJCLaw
Copy link
Collaborator Author

86f185e addresses the pathlib move, turns out there is a hook in pickle which solves exactly this issue and was easy to get working :)

@davidhalter
Copy link
Owner

davidhalter commented Jun 29, 2024

I don't think I would find a better solution to load those pickled objects. However, I think theres also a question of why these pickled objects need to be loaded across Python versions. iter_module_names AFAIK simply returns a list of strings, so if there's suddenly some weird pathlib object in there, I think we might also be able to fix something there. But I can also think about that a bit later once I have time (in a few hours).

@PeterJCLaw
Copy link
Collaborator Author

Interesting. I thought I saw a function (get_module_info) trying to be pickled, though maybe that was from a separate call and I misread the logging. I'll circle back to that and confirm what was happening.

I'm currently looking into why Python 3.13 is seeing flaky tests; I can intermittently reproduce it locally, so it's not just CI.

@PeterJCLaw
Copy link
Collaborator Author

On the flakes: I've got as far as it seeming to be that an inference state id for a state which the sub-process doesn't know about (or seems not to know about) is being requested to be deleted. From what I can tell the inference state in question has been used with its subprocess (and that use doesn't appear to have errored).

@davidhalter
Copy link
Owner

I thought I saw a function (get_module_info) trying to be pickled, though maybe that was from a separate call and I misread the logging. I'll circle back to that and confirm what was happening.

You are right about that. get_module_info is called from a pytest test with params from pytest. But as far as I can see even for that function there shouldn't be any special classes in there. However it might be that in some places a Path is returned now instead of a str. We can probably just use str(...) wherever that is needed and we will probably be fine?

Does that help? I feel like the Unpickler would work, but I think it's probably bad that we transfer complex Python objects anyway that we don't control, because we're working across Python versions.

I'm happy to also start debugging here if you don't feel like working on this anymore.

On the flakes: I've got as far as it seeming to be that an inference state id for a state which the sub-process doesn't know about (or seems not to know about) is being requested to be deleted. From what I can tell the inference state in question has been used with its subprocess (and that use doesn't appear to have errored).

That sounds pretty bad. This is probably some fallout from my hack of reusing the InferenceState datastructure in subprocesses. Sucks, but I'm not sure how to help. This is something I feel kind of guilty about and will of course work on it, if you don't find a solution quickly. There's not much I can recommend here, this is just annoying work and trying to understand code. Something I always try first is to use pytest -I to use the interpreter and avoid any subprocesses. Typically that solves the problem. I would then try to run 3.13 against 3.13 and see if that works. If only 3.13 against the other versions fails, I would guess that there's issues with pickling sometimes.

I haven't seen any signs about inference states in sub-processes. Is that a result of your debugging? If so, how did you debug this? Asking for a friend, because I'm pretty sure this is also going to haunt me :D

@PeterJCLaw
Copy link
Collaborator Author

PeterJCLaw commented Jun 29, 2024

On the pathlib stuff: I haven't looked in detail at which functions are called, though through debugging the other issue I feel I have a better understanding of what's going on with the subprocesses (in general) now. Having a very brief look at the functions there which could be causing the issue, get_sys_path feels like it could be one -- I believe sys.path supports pathlib.Path entries (and has for a while now), so that could easily have Paths in the returned value. That doesn't quite match up though as I had thought the issue the tests showed were when Python 3.13 was the parent and the error was in the child process. So that would suggests that the Path was in the arguments. Can't remember though - would need to check the test output to be sure.

On the subprocess stuff, I have observed that InferenceStateSubprocess.__del__ is called more than once for an instance with the same self._inference_state_id. I don't know if the id is a guarantee of the same object (especially since it's not the id of itself); can Python re-use ids within the same process if the previous instance has been garbage collected? Might that have changed in Python 3.13? (I know there's a bunch of work with a JIT happening, so GC changes seem plausible). If this is a change it feels a pretty big one and I don't see anything in the release notes.

In terms of how I've been debugging this, it's a lot of print statements I'm afraid. I've added a small hook to get logs back from the child process, but it's not pretty (and relies on collaboration from the subprocess, so it's not full logging; it also avoids using the stderr stream as I couldn't work out how to read that in a non-blocking manner). I've attached a diff to this message in case it's useful though.
debugging.diff.txt

I've then been running a subset of the tests in a loop until they fail:

export JEDI_TEST_ENVIRONMENT=3.10
while pytest --capture=no test/test_integration.py > out.txt 2>&1 ; do echo ; done

edit: added mention of JEDI_TEST_ENVIRONMENT.

I'm going to add some more tracking of the inference state ids to get a sense of where they get created, which will help clarify the GC interaction I hope.

I'm not sure how much more time I'll have to spend on this this weekend and probably won't get back to the pathlib stuff soon. Hopefully I'll make some more progress on the ids today though.

@PeterJCLaw
Copy link
Collaborator Author

PeterJCLaw commented Jun 29, 2024

On the subprocess stuff, I have observed that InferenceStateSubprocess.__del__ is called more than once for an instance with the same self._inference_state_id. I don't know if the id is a guarantee of the same object (especially since it's not the id of itself); can Python re-use ids within the same process if the previous instance has been garbage collected? Might that have changed in Python 3.13?

Looking in the docs for id:

Return the “identity” of an object. This is an integer which is guaranteed to be unique and constant for this object during its lifetime. Two objects with non-overlapping lifetimes may have the same id() value.

So maybe there was always scope for an issue here?

Either way, I think the use of ids to convey identity to the sub-process is the issue as they're not sufficiently unique with the current approach. I can't actually see anything amiss with the handling of the removals though - that seems pretty solid based on how I'd expect __del__ to work and this being essentially single-threaded. The only way I can see this happening is if __del__ for an instance is allowed to happen after the "replacement" instance has been created. Adding logging for that I do actually have evidence for that! This feels like it might actually be a Python bug at this point, since that doesn't seem like a good idea to me (allowing __del__ after another object has been __init__ with the same id and memory address).

Edit: for clarity: that's still looking at InferenceStateSubprocess.__init__ and InferenceStateSubprocess.__del__ (I slightly misread my logs), so not quite the smoking gun I was hoping for just yet.

@PeterJCLaw
Copy link
Collaborator Author

PeterJCLaw commented Jun 29, 2024

Aaah, aha, correcting myself then. I don't think this needs a Python bug.

I believe there's a race of the form:

  • InferenceState A created, gets id 1
  • InferenceStateSubprocess A' created, uses InferenceState A which it stores as a weakref and an id
  • InferenceStateSubprocess A' is used, the sub-process learns about an InferenceState with id 1
  • InferenceState A goes away, InferenceStateSubprocess A' is not yet garbage collected
  • InferenceState B created, gets id 1
  • InferenceStateSubprocess B' created, uses InferenceState B which it stores as a weakref and an id
  • InferenceStateSubprocess B' is used, the sub-process re-uses its entry for an InferenceState with id 1

Edit: have confirmed this from logging.

At this point the order of operations between the two InferenceStateSubprocess instances going away is immaterial -- both will trigger a removal of a state with id 1. As long as B' doesn't try to use the sub-process again after the first removal has happened then the second removal will fail.

I suspect that this could have happened under any Python version. My guess is that something in the GC behaviour on 3.13 is different and has highlighted the issue.

I think the solution is to create our own ids for the InferenceState instances. A global counter is probably fine I suspect, though it feels less neat than using the id. Another alternative might be to use the id of the InferenceStateSubprocess instance as that's the entity which we're actually locking around. Assuming that __del__ is guaranteed to happen before the next __init__ of an object with the same id, I think that'd guarantee correct behaviour (relying on the existing removals logic being otherwise correct, which I believe it is).

@davidhalter
Copy link
Owner

I think the solution is to create our own ids for the InferenceState instances. A global counter is probably fine I suspect, though it feels less neat than using the id. Another alternative might be to use the id of the InferenceStateSubprocess instance as that's the entity which we're actually locking around. Assuming that del is guaranteed to happen before the next init of an object with the same id, I think that'd guarantee correct behaviour (relying on the existing removals logic being otherwise correct, which I believe it is).

I'm probably fine with either solution, as long as it feels good to you. I generally am really happy that you were able to pin this down.

About the Path stuff: I can also try and install Python3.13 and see if I can reproduce it. I feel like once I can reproduce it, the fix will probably be quick. Whereas the whole InferenceState stuff is way way more complicated.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

test_completion_param_annotations fails in Python 3.13
2 participants