Support Python 3.13 #2003

PeterJCLaw · 2024-06-23T12:06:03Z

Various fixes to get the tests passing on Python 3.13, all within the tests themselves.
Also some small tidyups in one of the tests.
Moves to the 3.13 grammar from parso 0.8.4 and add testing on 3.13.

Reviewing by commit may be useful.

Fixes #2002.

Discovered issues:

pathlib has been refactored, there's now pathlib._local and pathlib._abc, which breaks passing pickled objects to subprocesses with different Python versions

This should make it easier to add new entries as well as clarifying the intent of this filter.

In Python 3.13 the `locals` function now returns a fresh mapping each time it's called (when called in a function). We thus need to store a reference to the mapping being used, rather than re-fetching it each time. Since we don't actually need to modify the locals within the scope of the test function itself, it suffices to use our own mapping here rather than the result of calling `locals`, which fully isolates this test from the nature of that function. Fixes davidhalter#2002

This moves to using the 3.13 grammar as well as testing 3.13 in CI.

davidhalter · 2024-06-23T17:39:27Z

Thanks a lot for looking into this! Please ping me once you are finished. I'm also happy to help with issues. It feels like you are almost done, though.

PeterJCLaw · 2024-06-23T18:44:09Z

I'm also happy to help with issues. It feels like you are almost done, though.

Heh, the "obvious" things are fixed. I don't immediately have great ideas for the pathlib thing though. I also haven't looked at why the test_imports.py tests are failing. This could be the same thing or they might be something else.

An idea I have around the pickles issue, but which I don't like, is that we could adjust the binary content of the pickle itself, to rename the path. That can be made to work (and I believe it could be done in a way that's reliable), but it doesn't feel right. It feels like a better solution would be to hook the pickling logic somehow, but I don't know if there are hook points which would enable that?

Jedi passes pickles to subprocesses which are running the target version of Python and thus may not be the same as the version under which Jedi itself is running. In Python 3.13, pathlib is being refactored to allow for easier extension and has thus moved most of its internal implementation to a submodule. Unfortunately this changes the paths of the symbols, causing pickles of those types to fail to load in earlier versions of Python. This commit introduces a custom unpickler which accounts for this move, allowing bi-directional passing of pickles to work.

PeterJCLaw · 2024-06-29T15:44:07Z

86f185e addresses the pathlib move, turns out there is a hook in pickle which solves exactly this issue and was easy to get working :)

davidhalter · 2024-06-29T16:15:00Z

I don't think I would find a better solution to load those pickled objects. However, I think theres also a question of why these pickled objects need to be loaded across Python versions. iter_module_names AFAIK simply returns a list of strings, so if there's suddenly some weird pathlib object in there, I think we might also be able to fix something there. But I can also think about that a bit later once I have time (in a few hours).

PeterJCLaw · 2024-06-29T16:22:42Z

Interesting. I thought I saw a function (get_module_info) trying to be pickled, though maybe that was from a separate call and I misread the logging. I'll circle back to that and confirm what was happening.

I'm currently looking into why Python 3.13 is seeing flaky tests; I can intermittently reproduce it locally, so it's not just CI.

PeterJCLaw · 2024-06-29T18:01:15Z

On the flakes: I've got as far as it seeming to be that an inference state id for a state which the sub-process doesn't know about (or seems not to know about) is being requested to be deleted. From what I can tell the inference state in question has been used with its subprocess (and that use doesn't appear to have errored).

davidhalter · 2024-06-29T18:23:33Z

I thought I saw a function (get_module_info) trying to be pickled, though maybe that was from a separate call and I misread the logging. I'll circle back to that and confirm what was happening.

You are right about that. get_module_info is called from a pytest test with params from pytest. But as far as I can see even for that function there shouldn't be any special classes in there. However it might be that in some places a Path is returned now instead of a str. We can probably just use str(...) wherever that is needed and we will probably be fine?

Does that help? I feel like the Unpickler would work, but I think it's probably bad that we transfer complex Python objects anyway that we don't control, because we're working across Python versions.

I'm happy to also start debugging here if you don't feel like working on this anymore.

On the flakes: I've got as far as it seeming to be that an inference state id for a state which the sub-process doesn't know about (or seems not to know about) is being requested to be deleted. From what I can tell the inference state in question has been used with its subprocess (and that use doesn't appear to have errored).

That sounds pretty bad. This is probably some fallout from my hack of reusing the InferenceState datastructure in subprocesses. Sucks, but I'm not sure how to help. This is something I feel kind of guilty about and will of course work on it, if you don't find a solution quickly. There's not much I can recommend here, this is just annoying work and trying to understand code. Something I always try first is to use pytest -I to use the interpreter and avoid any subprocesses. Typically that solves the problem. I would then try to run 3.13 against 3.13 and see if that works. If only 3.13 against the other versions fails, I would guess that there's issues with pickling sometimes.

I haven't seen any signs about inference states in sub-processes. Is that a result of your debugging? If so, how did you debug this? Asking for a friend, because I'm pretty sure this is also going to haunt me :D

PeterJCLaw · 2024-06-29T18:42:32Z

On the pathlib stuff: I haven't looked in detail at which functions are called, though through debugging the other issue I feel I have a better understanding of what's going on with the subprocesses (in general) now. Having a very brief look at the functions there which could be causing the issue, get_sys_path feels like it could be one -- I believe sys.path supports pathlib.Path entries (and has for a while now), so that could easily have Paths in the returned value. That doesn't quite match up though as I had thought the issue the tests showed were when Python 3.13 was the parent and the error was in the child process. So that would suggests that the Path was in the arguments. Can't remember though - would need to check the test output to be sure.

On the subprocess stuff, I have observed that InferenceStateSubprocess.__del__ is called more than once for an instance with the same self._inference_state_id. I don't know if the id is a guarantee of the same object (especially since it's not the id of itself); can Python re-use ids within the same process if the previous instance has been garbage collected? Might that have changed in Python 3.13? (I know there's a bunch of work with a JIT happening, so GC changes seem plausible). If this is a change it feels a pretty big one and I don't see anything in the release notes.

In terms of how I've been debugging this, it's a lot of print statements I'm afraid. I've added a small hook to get logs back from the child process, but it's not pretty (and relies on collaboration from the subprocess, so it's not full logging; it also avoids using the stderr stream as I couldn't work out how to read that in a non-blocking manner). I've attached a diff to this message in case it's useful though.
debugging.diff.txt

I've then been running a subset of the tests in a loop until they fail:

export JEDI_TEST_ENVIRONMENT=3.10
while pytest --capture=no test/test_integration.py > out.txt 2>&1 ; do echo ; done

edit: added mention of JEDI_TEST_ENVIRONMENT.

I'm going to add some more tracking of the inference state ids to get a sense of where they get created, which will help clarify the GC interaction I hope.

I'm not sure how much more time I'll have to spend on this this weekend and probably won't get back to the pathlib stuff soon. Hopefully I'll make some more progress on the ids today though.

PeterJCLaw · 2024-06-29T19:04:47Z

On the subprocess stuff, I have observed that InferenceStateSubprocess.__del__ is called more than once for an instance with the same self._inference_state_id. I don't know if the id is a guarantee of the same object (especially since it's not the id of itself); can Python re-use ids within the same process if the previous instance has been garbage collected? Might that have changed in Python 3.13?

Looking in the docs for id:

Return the “identity” of an object. This is an integer which is guaranteed to be unique and constant for this object during its lifetime. Two objects with non-overlapping lifetimes may have the same id() value.

So maybe there was always scope for an issue here?

Either way, I think the use of ids to convey identity to the sub-process is the issue as they're not sufficiently unique with the current approach. I can't actually see anything amiss with the handling of the removals though - that seems pretty solid based on how I'd expect __del__ to work and this being essentially single-threaded. The only way I can see this happening is if __del__ for an instance is allowed to happen after the "replacement" instance has been created. Adding logging for that I do actually have evidence for that! This feels like it might actually be a Python bug at this point, since that doesn't seem like a good idea to me (allowing __del__ after another object has been __init__ with the same id and memory address).

Edit: for clarity: that's still looking at InferenceStateSubprocess.__init__ and InferenceStateSubprocess.__del__ (I slightly misread my logs), so not quite the smoking gun I was hoping for just yet.

PeterJCLaw · 2024-06-29T19:21:01Z

Aaah, aha, correcting myself then. I don't think this needs a Python bug.

I believe there's a race of the form:

InferenceState A created, gets id 1
InferenceStateSubprocess A' created, uses InferenceState A which it stores as a weakref and an id
InferenceStateSubprocess A' is used, the sub-process learns about an InferenceState with id 1
InferenceState A goes away, InferenceStateSubprocess A' is not yet garbage collected
InferenceState B created, gets id 1
InferenceStateSubprocess B' created, uses InferenceState B which it stores as a weakref and an id
InferenceStateSubprocess B' is used, the sub-process re-uses its entry for an InferenceState with id 1

Edit: have confirmed this from logging.

At this point the order of operations between the two InferenceStateSubprocess instances going away is immaterial -- both will trigger a removal of a state with id 1. As long as B' doesn't try to use the sub-process again after the first removal has happened then the second removal will fail.

I suspect that this could have happened under any Python version. My guess is that something in the GC behaviour on 3.13 is different and has highlighted the issue.

I think the solution is to create our own ids for the InferenceState instances. A global counter is probably fine I suspect, though it feels less neat than using the id. Another alternative might be to use the id of the InferenceStateSubprocess instance as that's the entity which we're actually locking around. Assuming that __del__ is guaranteed to happen before the next __init__ of an object with the same id, I think that'd guarantee correct behaviour (relying on the existing removals logic being otherwise correct, which I believe it is).

davidhalter · 2024-06-29T22:33:59Z

I think the solution is to create our own ids for the InferenceState instances. A global counter is probably fine I suspect, though it feels less neat than using the id. Another alternative might be to use the id of the InferenceStateSubprocess instance as that's the entity which we're actually locking around. Assuming that del is guaranteed to happen before the next init of an object with the same id, I think that'd guarantee correct behaviour (relying on the existing removals logic being otherwise correct, which I believe it is).

I'm probably fine with either solution, as long as it feels good to you. I generally am really happy that you were able to pin this down.

About the Path stuff: I can also try and install Python3.13 and see if I can reproduce it. I feel like once I can reproduce it, the fix will probably be quick. Whereas the whole InferenceState stuff is way way more complicated.

PeterJCLaw added 6 commits June 23, 2024 12:43

Update 'next' signature in Python 3.13 where it's been fixed

9e6270e

Name this list of accepted symbol differences

11589b4

This should make it easier to add new entries as well as clarifying the intent of this filter.

Clarify filter by flipping the boolean logic

25c7127

Ignore more items extra in Python 3.13

6d37b46

Support Python 3.13

1ffdb02

This moves to using the 3.13 grammar as well as testing 3.13 in CI.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support Python 3.13 #2003

Support Python 3.13 #2003

PeterJCLaw commented Jun 23, 2024 •

edited

Loading

davidhalter commented Jun 23, 2024

PeterJCLaw commented Jun 23, 2024

PeterJCLaw commented Jun 29, 2024

davidhalter commented Jun 29, 2024 •

edited

Loading

PeterJCLaw commented Jun 29, 2024

PeterJCLaw commented Jun 29, 2024

davidhalter commented Jun 29, 2024

PeterJCLaw commented Jun 29, 2024 •

edited

Loading

PeterJCLaw commented Jun 29, 2024 •

edited

Loading

PeterJCLaw commented Jun 29, 2024 •

edited

Loading

davidhalter commented Jun 29, 2024

Support Python 3.13 #2003

Are you sure you want to change the base?

Support Python 3.13 #2003

Conversation

PeterJCLaw commented Jun 23, 2024 • edited Loading

davidhalter commented Jun 23, 2024

PeterJCLaw commented Jun 23, 2024

PeterJCLaw commented Jun 29, 2024

davidhalter commented Jun 29, 2024 • edited Loading

PeterJCLaw commented Jun 29, 2024

PeterJCLaw commented Jun 29, 2024

davidhalter commented Jun 29, 2024

PeterJCLaw commented Jun 29, 2024 • edited Loading

PeterJCLaw commented Jun 29, 2024 • edited Loading

PeterJCLaw commented Jun 29, 2024 • edited Loading

davidhalter commented Jun 29, 2024

PeterJCLaw commented Jun 23, 2024 •

edited

Loading

davidhalter commented Jun 29, 2024 •

edited

Loading

PeterJCLaw commented Jun 29, 2024 •

edited

Loading

PeterJCLaw commented Jun 29, 2024 •

edited

Loading

PeterJCLaw commented Jun 29, 2024 •

edited

Loading