Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: add the explain_plan function #1328

Open
wants to merge 12 commits into
base: main
Choose a base branch
from

Conversation

nuvic
Copy link

@nuvic nuvic commented May 28, 2024

It's useful to see the underlying query plan for debugging purposes. This exposes LanceScanner's explain_plan function. Addresses #1288

@github-actions github-actions bot added enhancement New feature or request Python Python SDK labels May 28, 2024
Copy link

ACTION NEEDED

Lance follows the Conventional Commits specification for release automation.

The PR title and description are used as the merge commit message. Please update your PR title and description to match the specification.

For details on the error please inspect the "PR Title Check" action.

@nuvic nuvic force-pushed the feat/explain-plan-in-sdk branch from 18d8876 to 2021d5a Compare May 28, 2024 18:26
@nuvic nuvic changed the title feat: Add the explain_plan function feat: add the explain_plan function May 28, 2024
@nuvic
Copy link
Author

nuvic commented May 28, 2024

Hi @wjones127, I've added the explain_plan function to the python package. Just wanted to check that this is the correct approach before I continue. Thanks

Copy link
Contributor

@wjones127 wjones127 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was thinking explain_plan would go on LanceQueryBuilder, so users could call it on their exact calls to make a query. If you do it like this, users have to figure out how to translate their parameters in this query.

There is a method _execute_query, where the plan is executed. You would do the same thing in there, but call explain_plan() after the scanner() call:

return ds.scanner(
columns=query.columns,
filter=query.filter,
prefilter=query.prefilter,
nearest={
"column": query.vector_column,
"q": query.vector,
"k": query.k,
"metric": query.metric,
"nprobes": query.nprobes,
"refine_factor": query.refine_factor,
},
with_row_id=query.with_row_id,
batch_size=batch_size,
).to_reader()

@@ -417,6 +417,35 @@ def with_row_id(self, with_row_id: bool) -> LanceQueryBuilder:
self._with_row_id = with_row_id
return self

def explain_plan(self, verbose: Optional[bool] = False) -> str:
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, that makes more sense. Added to LanceQueryBuilder

Copy link
Contributor

@wjones127 wjones127 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's a few more changes you'll need to make to get the doctests to pass, but otherwise looks good.

Comment on lines 425 to 427
>>> import lancedb
>>> db = lancedb.connect("./.lancedb")
>>> table = db.open_table("my_table")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For the example to work, you'll need to create the table:

Suggested change
>>> import lancedb
>>> db = lancedb.connect("./.lancedb")
>>> table = db.open_table("my_table")
>>> import lancedb
>>> db = lancedb.connect("./.lancedb")
>>> db.create_table("my_table", [{"vector": [99, 99]}])
>>> table = db.open_table("my_table")

Comment on lines 429 to 430
>>> plan = table.search(query).explain_plan(True)
>>> print(plan)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To make the example more informative and get the example to pass, you should print the output. Also, you can use ... as a wildcard so you don't have to write it all out.

Suggested change
>>> plan = table.search(query).explain_plan(True)
>>> print(plan)
>>> table.search(query).explain_plan(True)
>>> print(plan)
Projection: fields=[i, s, vec, _distance]
Take: columns=\"_distance, _rowid, vec, i, s\"
SortExec: TopK(fetch=5), expr=...
KNNIndex: name=..., k=5, deltas=1
ScalarIndexQuery: query=i > 10

To run the doctests locally, you can run:

pytest --doctest-modules python/lancedb

Copy link
Author

@nuvic nuvic Jun 2, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh I see. Okay I've updated the examples so they pass the doctest
2aabbbc
f3b9624

@nuvic
Copy link
Author

nuvic commented Jun 3, 2024

I'm trying to figure out how to add this to the nodejs sdk. Looking at nodejs/src, there doesn't seem to be a Scanner exposed there, so it's not as simple as the python solution.
Since we want to put explain_plan in the Query data structure (nodejs/src/query.rs), we need to get access to a dataset ref there, to get to the scanner. What do you recommend?

@wjones127
Copy link
Contributor

I'm trying to figure out how to add this to the nodejs sdk. Looking at nodejs/src, there doesn't seem to be a Scanner exposed there, so it's not as simple as the python solution.

This is true. Although you've only implemented for the legacy Python sync API, which wraps pylance. In the future, we'll also want to add to the new Python async API. (Eventually the legacy Python sync API will be replaced by wrapping the async API, and LanceDB will no longer depend on pylance.) The async API also doesn't have a scanner directly exposed.

Since we want to put explain_plan in the Query data structure (nodejs/src/query.rs), we need to get access to a dataset ref there, to get to the scanner. What do you recommend?

You will need to add a explain_plan method to the Rust Query structure here:

impl Query {

And that should call an explain_plan method you'll have to add to to the ExecutableQuery trait in this file:

pub trait ExecutableQuery {

Doing this for Python async API will be very similar to this.

@github-actions github-actions bot added the Rust Rust related issues label Jun 17, 2024
@nuvic nuvic marked this pull request as ready for review June 24, 2024 03:56
@nuvic
Copy link
Author

nuvic commented Jun 24, 2024

Thanks for the explanation above. I've now implemented the method for the python async API and the nodejs API. Could you take a look?

@@ -1756,6 +1757,37 @@ impl TableInternal for NativeTable {
.await
}

async fn explain_plan(&self, query: &VectorQuery, verbose: bool) -> Result<String> {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel like this replicates some (but not all) of the logic from create_plan above. Could we instead pull this out to some common function (build_plan?) and then both create_plan and explain_plan can be built off of those?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request Python Python SDK Rust Rust related issues
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants