Fix binary search in `findNodeByOffset` to match linear search result #8143

debonte · 2024-06-15T00:14:08Z

Fixes several issues related to the binary search optimization in findNodeByOffset:

findNodeByOffset and getIndexContaining had different rules about when an offset is within a ParseNode. findNodeByOffset uses TextRange.overlaps (includes end offset), whereas getIndexContaining uses TextRange.contains (excludes end offset). Given that that was apparently an intentional difference and has been that way for almost 5 years, I'm not interested in changing it. Fixed by allowing callers to provide a matching algorithm as an optional parameter on getIndexContaining.
After getting the binary search result we need to scan back through the siblings to find the earliest sibling that contains the target offset. This is necessary to match the linear search behavior, which returns the first child that contains the offset.
findNonNullElement was incrementing/decrementing an array index, but it didn't actually use that index for anything -- it always used the position that was passed in. So if arr[position] was undefined, findNonNullElement would always return undefined.
After calling findNonNullElement, if the resulting element was a match (contains the offset), we would always return mid, but if arr[mid] is undefined, that's the wrong index to return -- we should return the index of the element that findNonNullElement found.

I'm torn on the usefullness of the tests and would be fine removing them if you'd prefer, or to change how they work if you have better ideas. We clearly have poor test coverage on this code, but I worry that future changes to the parser could cause these tests to no longer test what they're supposed to be testing (esp. "findNodeByOffset with binary search choose earliest match"). A change to the number of nodes generated would change the binary search's path through the nodes.

heejaechang · 2024-06-17T19:16:38Z

I have a few questions about the contracts on parse nodes:

Is it allowed for parse nodes to overlap between siblings?
Can the child nodes of a parse node exceed the range of their parent node?
Is the order of parse nodes always the same as the order of the text they represent?

Additionally, I have a question about the contracts on findNodeByOffset:

When a position is adjacent to two nodes (the end position of the previous node is equal to the start position of the current node), which node does findNodeByOffset return? Does it return the previous node because (1) is not true, or because no two nodes can be adjacent to each other? Is that guaranteed?

It seems like we need to understand some of the contracts on the parse tree/node structure before deciding how to fix the current issue.

erictraut · 2024-06-20T14:34:48Z

Sorry for the slow response, just noticed this PR.

@heejaechang, to answer your questions, search for isCompliantWithNodeRangeRules in the source code. There are a couple of exceptions to the general rules that the search routines must take into account.

debonte · 2024-06-21T22:33:52Z

When a position is adjacent to two nodes (the end position of the previous node is equal to the start position of the current node), which node does findNodeByOffset return? Does it return the previous node because (1) is not true, or because no two nodes can be adjacent to each other? Is that guaranteed?

It returns the "previous node" because it always prefers the first sibling that overlaps (as in TextRange.overlaps) the specified position. That was the existing linear search behavior, and is now the behavior of the binary search as well.

debonte · 2024-06-21T23:02:34Z

@erictraut, could you review this PR?

As I mentioned in the PR description, I'm happy to remove the tests if you don't think they're worthwhile, or rework them if you see a better approach.

erictraut · 2024-06-22T05:55:32Z

This change shouldn't affect core type checking; it's used only by language server code. If you've confirmed that it's working with the existing LS providers, then I'm fine with it.

github-actions · 2024-06-22T15:59:58Z

According to mypy_primer, this change doesn't affect type check results on a corpus of open source code. ✅

debonte added 2 commits June 14, 2024 16:33

Fix from pyrx PR 5251

f6274b3

Unit tests

5193465

debonte requested review from heejaechang and erictraut June 15, 2024 00:14

This comment has been minimized.

Sign in to view

heejaechang approved these changes Jun 21, 2024

View reviewed changes

erictraut approved these changes Jun 22, 2024

View reviewed changes

Merge branch 'main' into pylance5775

8d35850

debonte merged commit 3e6661a into microsoft:main Jun 22, 2024
12 checks passed

debonte deleted the pylance5775 branch June 22, 2024 16:10

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix binary search in `findNodeByOffset` to match linear search result #8143

Fix binary search in `findNodeByOffset` to match linear search result #8143

debonte commented Jun 15, 2024

This comment has been minimized.

heejaechang commented Jun 17, 2024 •

edited

Loading

erictraut commented Jun 20, 2024

debonte commented Jun 21, 2024

debonte commented Jun 21, 2024

erictraut commented Jun 22, 2024

github-actions bot commented Jun 22, 2024

Fix binary search in findNodeByOffset to match linear search result #8143

Fix binary search in findNodeByOffset to match linear search result #8143

Conversation

debonte commented Jun 15, 2024

This comment has been minimized.

heejaechang commented Jun 17, 2024 • edited Loading

erictraut commented Jun 20, 2024

debonte commented Jun 21, 2024

debonte commented Jun 21, 2024

erictraut commented Jun 22, 2024

github-actions bot commented Jun 22, 2024

Fix binary search in `findNodeByOffset` to match linear search result #8143

Fix binary search in `findNodeByOffset` to match linear search result #8143

heejaechang commented Jun 17, 2024 •

edited

Loading