Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix binary search in findNodeByOffset to match linear search result #8143

Merged
merged 3 commits into from
Jun 22, 2024

Conversation

debonte
Copy link
Collaborator

@debonte debonte commented Jun 15, 2024

Address microsoft/pylance-release#5775

Fixes several issues related to the binary search optimization in findNodeByOffset:

  1. findNodeByOffset and getIndexContaining had different rules about when an offset is within a ParseNode. findNodeByOffset uses TextRange.overlaps (includes end offset), whereas getIndexContaining uses TextRange.contains (excludes end offset). Given that that was apparently an intentional difference and has been that way for almost 5 years, I'm not interested in changing it. Fixed by allowing callers to provide a matching algorithm as an optional parameter on getIndexContaining.
  2. After getting the binary search result we need to scan back through the siblings to find the earliest sibling that contains the target offset. This is necessary to match the linear search behavior, which returns the first child that contains the offset.
  3. findNonNullElement was incrementing/decrementing an array index, but it didn't actually use that index for anything -- it always used the position that was passed in. So if arr[position] was undefined, findNonNullElement would always return undefined.
  4. After calling findNonNullElement, if the resulting element was a match (contains the offset), we would always return mid, but if arr[mid] is undefined, that's the wrong index to return -- we should return the index of the element that findNonNullElement found.

I'm torn on the usefullness of the tests and would be fine removing them if you'd prefer, or to change how they work if you have better ideas. We clearly have poor test coverage on this code, but I worry that future changes to the parser could cause these tests to no longer test what they're supposed to be testing (esp. "findNodeByOffset with binary search choose earliest match"). A change to the number of nodes generated would change the binary search's path through the nodes.

This comment has been minimized.

@heejaechang
Copy link
Collaborator

heejaechang commented Jun 17, 2024

I have a few questions about the contracts on parse nodes:

  1. Is it allowed for parse nodes to overlap between siblings?
  2. Can the child nodes of a parse node exceed the range of their parent node?
  3. Is the order of parse nodes always the same as the order of the text they represent?

Additionally, I have a question about the contracts on findNodeByOffset:

When a position is adjacent to two nodes (the end position of the previous node is equal to the start position of the current node), which node does findNodeByOffset return? Does it return the previous node because (1) is not true, or because no two nodes can be adjacent to each other? Is that guaranteed?

It seems like we need to understand some of the contracts on the parse tree/node structure before deciding how to fix the current issue.

@erictraut
Copy link
Collaborator

Sorry for the slow response, just noticed this PR.

@heejaechang, to answer your questions, search for isCompliantWithNodeRangeRules in the source code. There are a couple of exceptions to the general rules that the search routines must take into account.

@debonte
Copy link
Collaborator Author

debonte commented Jun 21, 2024

When a position is adjacent to two nodes (the end position of the previous node is equal to the start position of the current node), which node does findNodeByOffset return? Does it return the previous node because (1) is not true, or because no two nodes can be adjacent to each other? Is that guaranteed?

It returns the "previous node" because it always prefers the first sibling that overlaps (as in TextRange.overlaps) the specified position. That was the existing linear search behavior, and is now the behavior of the binary search as well.

@debonte
Copy link
Collaborator Author

debonte commented Jun 21, 2024

@erictraut, could you review this PR?

As I mentioned in the PR description, I'm happy to remove the tests if you don't think they're worthwhile, or rework them if you see a better approach.

@erictraut
Copy link
Collaborator

This change shouldn't affect core type checking; it's used only by language server code. If you've confirmed that it's working with the existing LS providers, then I'm fine with it.

Copy link
Contributor

According to mypy_primer, this change doesn't affect type check results on a corpus of open source code. ✅

@debonte debonte merged commit 3e6661a into microsoft:main Jun 22, 2024
12 checks passed
@debonte debonte deleted the pylance5775 branch June 22, 2024 16:10
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants