-
Notifications
You must be signed in to change notification settings - Fork 224
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: add the explain_plan function #1328
base: main
Are you sure you want to change the base?
Conversation
ACTION NEEDED Lance follows the Conventional Commits specification for release automation. The PR title and description are used as the merge commit message. Please update your PR title and description to match the specification. For details on the error please inspect the "PR Title Check" action. |
18d8876
to
2021d5a
Compare
Hi @wjones127, I've added the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was thinking explain_plan
would go on LanceQueryBuilder
, so users could call it on their exact calls to make a query. If you do it like this, users have to figure out how to translate their parameters in this query.
There is a method _execute_query
, where the plan is executed. You would do the same thing in there, but call explain_plan()
after the scanner()
call:
lancedb/python/python/lancedb/table.py
Lines 1670 to 1684 in 2021d5a
return ds.scanner( | |
columns=query.columns, | |
filter=query.filter, | |
prefilter=query.prefilter, | |
nearest={ | |
"column": query.vector_column, | |
"q": query.vector, | |
"k": query.k, | |
"metric": query.metric, | |
"nprobes": query.nprobes, | |
"refine_factor": query.refine_factor, | |
}, | |
with_row_id=query.with_row_id, | |
batch_size=batch_size, | |
).to_reader() |
@@ -417,6 +417,35 @@ def with_row_id(self, with_row_id: bool) -> LanceQueryBuilder: | |||
self._with_row_id = with_row_id | |||
return self | |||
|
|||
def explain_plan(self, verbose: Optional[bool] = False) -> str: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, that makes more sense. Added to LanceQueryBuilder
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's a few more changes you'll need to make to get the doctests to pass, but otherwise looks good.
python/python/lancedb/query.py
Outdated
>>> import lancedb | ||
>>> db = lancedb.connect("./.lancedb") | ||
>>> table = db.open_table("my_table") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For the example to work, you'll need to create the table:
>>> import lancedb | |
>>> db = lancedb.connect("./.lancedb") | |
>>> table = db.open_table("my_table") | |
>>> import lancedb | |
>>> db = lancedb.connect("./.lancedb") | |
>>> db.create_table("my_table", [{"vector": [99, 99]}]) | |
>>> table = db.open_table("my_table") |
python/python/lancedb/query.py
Outdated
>>> plan = table.search(query).explain_plan(True) | ||
>>> print(plan) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To make the example more informative and get the example to pass, you should print the output. Also, you can use ...
as a wildcard so you don't have to write it all out.
>>> plan = table.search(query).explain_plan(True) | |
>>> print(plan) | |
>>> table.search(query).explain_plan(True) | |
>>> print(plan) | |
Projection: fields=[i, s, vec, _distance] | |
Take: columns=\"_distance, _rowid, vec, i, s\" | |
SortExec: TopK(fetch=5), expr=... | |
KNNIndex: name=..., k=5, deltas=1 | |
ScalarIndexQuery: query=i > 10 |
To run the doctests locally, you can run:
pytest --doctest-modules python/lancedb
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm trying to figure out how to add this to the nodejs sdk. Looking at |
This is true. Although you've only implemented for the legacy Python sync API, which wraps pylance. In the future, we'll also want to add to the new Python async API. (Eventually the legacy Python sync API will be replaced by wrapping the async API, and LanceDB will no longer depend on pylance.) The async API also doesn't have a scanner directly exposed.
You will need to add a Line 33 in 56b4fd2
And that should call an lancedb/rust/lancedb/src/query.rs Line 430 in 56b4fd2
Doing this for Python async API will be very similar to this. |
523d832
to
1c56db8
Compare
Thanks for the explanation above. I've now implemented the method for the python async API and the nodejs API. Could you take a look? |
@@ -1756,6 +1757,37 @@ impl TableInternal for NativeTable { | |||
.await | |||
} | |||
|
|||
async fn explain_plan(&self, query: &VectorQuery, verbose: bool) -> Result<String> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I feel like this replicates some (but not all) of the logic from create_plan
above. Could we instead pull this out to some common function (build_plan
?) and then both create_plan
and explain_plan
can be built off of those?
It's useful to see the underlying query plan for debugging purposes. This exposes LanceScanner's
explain_plan
function. Addresses #1288