feat: add the explain_plan function #1328

nuvic · 2024-05-28T18:18:47Z

It's useful to see the underlying query plan for debugging purposes. This exposes LanceScanner's explain_plan function. Addresses #1288

github-actions · 2024-05-28T18:19:07Z

ACTION NEEDED

Lance follows the Conventional Commits specification for release automation.

The PR title and description are used as the merge commit message. Please update your PR title and description to match the specification.

For details on the error please inspect the "PR Title Check" action.

nuvic · 2024-05-28T18:34:17Z

Hi @wjones127, I've added the explain_plan function to the python package. Just wanted to check that this is the correct approach before I continue. Thanks

wjones127

I was thinking explain_plan would go on LanceQueryBuilder, so users could call it on their exact calls to make a query. If you do it like this, users have to figure out how to translate their parameters in this query.

There is a method _execute_query, where the plan is executed. You would do the same thing in there, but call explain_plan() after the scanner() call:

lancedb/python/python/lancedb/table.py

Lines 1670 to 1684 in 2021d5a

    
           return ds.scanner( 
        
               columns=query.columns, 
        
               filter=query.filter, 
        
               prefilter=query.prefilter, 
        
               nearest={ 
        
                   "column": query.vector_column, 
        
                   "q": query.vector, 
        
                   "k": query.k, 
        
                   "metric": query.metric, 
        
                   "nprobes": query.nprobes, 
        
                   "refine_factor": query.refine_factor, 
        
               }, 
        
               with_row_id=query.with_row_id, 
        
               batch_size=batch_size, 
        
           ).to_reader()

nuvic · 2024-05-31T05:04:20Z

python/python/lancedb/query.py

@@ -417,6 +417,35 @@ def with_row_id(self, with_row_id: bool) -> LanceQueryBuilder:
        self._with_row_id = with_row_id
        return self

+    def explain_plan(self, verbose: Optional[bool] = False) -> str:


Thanks, that makes more sense. Added to LanceQueryBuilder

wjones127

There's a few more changes you'll need to make to get the doctests to pass, but otherwise looks good.

wjones127 · 2024-05-31T18:39:13Z

python/python/lancedb/query.py

+        >>> import lancedb
+        >>> db = lancedb.connect("./.lancedb")
+        >>> table = db.open_table("my_table")


For the example to work, you'll need to create the table:

Suggested change

>>> import lancedb

>>> db = lancedb.connect("./.lancedb")

>>> table = db.open_table("my_table")

>>> import lancedb

>>> db = lancedb.connect("./.lancedb")

>>> db.create_table("my_table", [{"vector": [99, 99]}])

>>> table = db.open_table("my_table")

wjones127 · 2024-05-31T18:43:04Z

python/python/lancedb/query.py

+        >>> plan = table.search(query).explain_plan(True)
+        >>> print(plan)


To make the example more informative and get the example to pass, you should print the output. Also, you can use ... as a wildcard so you don't have to write it all out.

Suggested change

>>> plan = table.search(query).explain_plan(True)

>>> print(plan)

>>> table.search(query).explain_plan(True)

>>> print(plan)

Projection: fields=[i, s, vec, _distance]

Take: columns=\"_distance, _rowid, vec, i, s\"

SortExec: TopK(fetch=5), expr=...

KNNIndex: name=..., k=5, deltas=1

ScalarIndexQuery: query=i > 10

To run the doctests locally, you can run:

pytest --doctest-modules python/lancedb

Oh I see. Okay I've updated the examples so they pass the doctest
2aabbbc
f3b9624

nuvic · 2024-06-03T06:03:49Z

I'm trying to figure out how to add this to the nodejs sdk. Looking at nodejs/src, there doesn't seem to be a Scanner exposed there, so it's not as simple as the python solution.
Since we want to put explain_plan in the Query data structure (nodejs/src/query.rs), we need to get access to a dataset ref there, to get to the scanner. What do you recommend?

wjones127 · 2024-06-03T17:13:20Z

I'm trying to figure out how to add this to the nodejs sdk. Looking at nodejs/src, there doesn't seem to be a Scanner exposed there, so it's not as simple as the python solution.

This is true. Although you've only implemented for the legacy Python sync API, which wraps pylance. In the future, we'll also want to add to the new Python async API. (Eventually the legacy Python sync API will be replaced by wrapping the async API, and LanceDB will no longer depend on pylance.) The async API also doesn't have a scanner directly exposed.

Since we want to put explain_plan in the Query data structure (nodejs/src/query.rs), we need to get access to a dataset ref there, to get to the scanner. What do you recommend?

You will need to add a explain_plan method to the Rust Query structure here:

lancedb/nodejs/src/query.rs

Line 33 in 56b4fd2

impl Query {

And that should call an explain_plan method you'll have to add to to the ExecutableQuery trait in this file:

lancedb/rust/lancedb/src/query.rs

Line 430 in 56b4fd2

pub trait ExecutableQuery {

Doing this for Python async API will be very similar to this.

nuvic · 2024-06-24T03:58:49Z

Thanks for the explanation above. I've now implemented the method for the python async API and the nodejs API. Could you take a look?

wjones127 · 2024-06-25T15:39:00Z

rust/lancedb/src/table.rs

@@ -1756,6 +1757,37 @@ impl TableInternal for NativeTable {
            .await
    }

+    async fn explain_plan(&self, query: &VectorQuery, verbose: bool) -> Result<String> {


I feel like this replicates some (but not all) of the logic from create_plan above. Could we instead pull this out to some common function (build_plan?) and then both create_plan and explain_plan can be built off of those?

github-actions bot added enhancement New feature or request Python Python SDK labels May 28, 2024

nuvic force-pushed the feat/explain-plan-in-sdk branch from 18d8876 to 2021d5a Compare May 28, 2024 18:26

nuvic changed the title ~~feat: Add the explain_plan function~~ feat: add the explain_plan function May 28, 2024

wjones127 reviewed May 29, 2024

View reviewed changes

nuvic commented May 31, 2024

View reviewed changes

wjones127 reviewed May 31, 2024

View reviewed changes

github-actions bot added the Rust Rust related issues label Jun 17, 2024

nuvic added 11 commits June 23, 2024 20:49

feat(python): add the explain_plan function (lancedb#1288)

ba7b12f

remove explain_plan from table.py

8214014

add explain_plan to LanceQueryBuilder

750f5fc

test explain_plan in query

8f5e70b

update explain_plan example with create_table

3416a72

fix doctest example

9b86e48

expose explain_plan in rust lib

a3e3cb1

expose to nodejs sdk

e7b3e53

expose explain_plan in python async sdk

1ff90bd

update nodejs example doc

43436ee

python add docs for async method

1c56db8

nuvic force-pushed the feat/explain-plan-in-sdk branch from 523d832 to 1c56db8 Compare June 24, 2024 03:53

fix merge conflict

c0979df

nuvic marked this pull request as ready for review June 24, 2024 03:56

wjones127 reviewed Jun 25, 2024

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: add the explain_plan function #1328

feat: add the explain_plan function #1328

nuvic commented May 28, 2024 •

edited

Loading

github-actions bot commented May 28, 2024

nuvic commented May 28, 2024

wjones127 left a comment

nuvic May 31, 2024

wjones127 left a comment

wjones127 May 31, 2024

wjones127 May 31, 2024

nuvic Jun 2, 2024 •

edited

Loading

nuvic commented Jun 3, 2024

wjones127 commented Jun 3, 2024

nuvic commented Jun 24, 2024

wjones127 Jun 25, 2024

	return ds.scanner(
	columns=query.columns,
	filter=query.filter,
	prefilter=query.prefilter,
	nearest={
	"column": query.vector_column,
	"q": query.vector,
	"k": query.k,
	"metric": query.metric,
	"nprobes": query.nprobes,
	"refine_factor": query.refine_factor,
	},
	with_row_id=query.with_row_id,
	batch_size=batch_size,
	).to_reader()

		>>> plan = table.search(query).explain_plan(True)
		>>> print(plan)

-        >>> plan = table.search(query).explain_plan(True)
-        >>> print(plan)
+        >>> table.search(query).explain_plan(True)
+        >>> print(plan)
+        Projection: fields=[i, s, vec, _distance]
+          Take: columns=\"_distance, _rowid, vec, i, s\"
+            SortExec: TopK(fetch=5), expr=...
+              KNNIndex: name=..., k=5, deltas=1
+                ScalarIndexQuery: query=i > 10

feat: add the explain_plan function #1328

Are you sure you want to change the base?

feat: add the explain_plan function #1328

Conversation

nuvic commented May 28, 2024 • edited Loading

github-actions bot commented May 28, 2024

nuvic commented May 28, 2024

wjones127 left a comment

Choose a reason for hiding this comment

nuvic May 31, 2024

Choose a reason for hiding this comment

wjones127 left a comment

Choose a reason for hiding this comment

wjones127 May 31, 2024

Choose a reason for hiding this comment

wjones127 May 31, 2024

Choose a reason for hiding this comment

nuvic Jun 2, 2024 • edited Loading

Choose a reason for hiding this comment

nuvic commented Jun 3, 2024

wjones127 commented Jun 3, 2024

nuvic commented Jun 24, 2024

wjones127 Jun 25, 2024

Choose a reason for hiding this comment

nuvic commented May 28, 2024 •

edited

Loading

nuvic Jun 2, 2024 •

edited

Loading