Add support for Weaviate #180

hsm207 · 2024-06-12T05:28:22Z

Signed-off-by: hsm207 [email protected]

Signed-off-by: hsm207 <[email protected]>

treygrainger · 2024-06-24T08:50:13Z

engines/weaviate/WeaviateEngine.py

+
+    def enable_ltr(self, collection):
+        "Initializes LTR dependencies for a given collection"
+        raise NotImplementedError("¯\\_(ツ)_/¯")


This method may go away (TBD). It's currently just a hook to tell the engine to adjust it's configuration if needed to support LTR on the given collection. In the case of Weaviate, I assume the LTR model will be running and invoked outside the engine, so you'll either have it always running (in which enable_ltr can be a noop) OR you can use enable_ltr to copy any data/models/config needed into place.

treygrainger · 2024-06-24T09:20:06Z

@hsm207 - Awesome to see you working on this. If you have any questions, don't hesitate to reach out.

After doing several other implementations, here's a bit of an implementation checklist for key things you'll come across during the implementation:

Dockerfile / Docker compose configuration
Install Spark connector (inside the aips-notebooks Dockerfle)
Collection management: creation/deletion/healthcheck
Collection schemas:
Primative field types: text, string, keyword, boolean, integer, double
location coordinate field
dense vector field: dimensions (512, 768), vector encoding/quantization (1bit, 32 bits), and dot_product similarity
tokenizers/filters: comma delimited, lower case, whitespace/punctuation, NGram, delimited payload

Query functionality:
sorting, filtering, limit, query fields, return fields
multi-field search
AND/OR/NOT operators
minimum phrase matching
query time boosting
index time boosting
vector search
reranking by query
highlighting
debug/explain
spell check/autocomplete

There are some other things like hybrid search (reciprocal rank fusion) that are implemented at the Collection level already generically, but that you can override in the WeaviateCollection to push down into the engine, since Weaviate has native support for that built in.

As mentioned in the /engines/README.md, the LTR implementation is required, but can be done outside the engine. Happy to chat with you on this if you need a generic implementation. The SparseLexicalSemanticSearch implementation is likewise required, but it's just crafting some very specific Weaviate query syntax for a handful of specific query patterns (popularity boosting, geo radius filtering, etc.) I wouldn't worry about the EntityExtractor or the SemanticKnowledgeGraph, as most engines don't have this built in and you just treat this as an external library call.

At any rate, hope that's helpful. Let us know if you have any questions we can assist with!

hsm207 added 12 commits June 12, 2024 03:36

add weaviate container

da66e05

Signed-off-by: hsm207 <[email protected]>

fix weaviate port mapping

b0e3370

Signed-off-by: hsm207 <[email protected]>

update dependencies

3f7903e

Signed-off-by: hsm207 <[email protected]>

stub weaviate engine

b90f340

Signed-off-by: hsm207 <[email protected]>

implement health check

141d4d5

Signed-off-by: hsm207 <[email protected]>

update dependencies

66a57c6

Signed-off-by: hsm207 <[email protected]>

implement create_collection

8b40991

Signed-off-by: hsm207 <[email protected]>

reformat code

3cb5943

Signed-off-by: hsm207 <[email protected]>

implement get collection

6bb6237

Signed-off-by: hsm207 <[email protected]>

code cleanup

f19e617

Signed-off-by: hsm207 <[email protected]>

implement enable ltr

8554402

Signed-off-by: hsm207 <[email protected]>

code cleanup

0c60901

Signed-off-by: hsm207 <[email protected]>

treygrainger reviewed Jun 24, 2024

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add support for Weaviate #180

Add support for Weaviate #180

hsm207 commented Jun 12, 2024

treygrainger Jun 24, 2024

treygrainger commented Jun 24, 2024

Add support for Weaviate #180

Are you sure you want to change the base?

Add support for Weaviate #180

Conversation

hsm207 commented Jun 12, 2024

treygrainger Jun 24, 2024

Choose a reason for hiding this comment

treygrainger commented Jun 24, 2024