Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

sqlite query generation REGEXP doesn't work. #4540

Open
ryanhamilton opened this issue Jun 5, 2024 · 8 comments
Open

sqlite query generation REGEXP doesn't work. #4540

ryanhamilton opened this issue Jun 5, 2024 · 8 comments

Comments

@ryanhamilton
Copy link

What's up?

By default PRQL seems to generate sqlite queries with REGEXP.
This means code like this doesn't work:
timeseries/qstudio#50
If where possible PRQL avoided REGEXP and used LIKE many more queries may work out of the box.
REGEXP is also risky to use as the implentation depends on the platform so users may get different results.

@max-sixty
Copy link
Member

Is there a minimal example? I see the SQL in the linked issue but can't see the PRQL.

@richb-hanover
Copy link
Contributor

richb-hanover commented Jun 5, 2024

timeseries/qstudio#50 is my report. The underlying conundrum is that SQLite does not ship with a built-in REGEXP facility. When qStudio calls prqlc, even if the query has prql target:sql.sqlite, PRQL still generates the call to a REGEX function, which causes the (standard) SQLite to throw the error shown.

There are a few alternatives:

  • Remove regex from PRQL (I am not willing to give this up)
  • Remove regex from PRQL if the target is SQLite (not attractive, either)
  • Find a way to add regex processing to the SQLite that qStudio uses (maybe nalgeon? maybe some other project?)

Update: @ryanhamilton has also raised an issue at nalgeon/sqlean#119

@max-sixty
Copy link
Member

OK, thanks.

So it sounds the minimal example:

prql target:sql.sqlite

from foo
derive x ~= 'bar$'

...shouldn't generate...

SELECT
  *,
  x REGEXP 'bar$'
FROM
  foo

-- Generated by PRQL compiler version:0.11.3 (https://prql-lang.org)

...I do remember seeing some issues that regex isn't natively implemented by sqlite.


One option would be to inspect the regex and generate a LIKE if it's simple enough — i.e. in the case above it could generate a LIKE "%bar". I think that would be a fairly easy algo to write — basically if there are no special characters, just add % to the start & end unless a ^ or $ exist, and send. (Need to think about case etc too)

@richb-hanover
Copy link
Contributor

Let's hold off to see what Ryan discovers. (I'm not in a rush - the like function #1123 serves my needs now.)

If it turns out to be straightforward to get more addin functions into the SQLite that's bundled into qStudio, that'll be a win for all.

@ryanhamilton
Copy link
Author

Sqlean said now. Last chance is for xerial to create a fat jar:
xerial/sqlite-jdbc#1125

@richb-hanover
Copy link
Contributor

Thanks for the update

@PrettyWood
Copy link
Collaborator

PrettyWood commented Jun 11, 2024

@ryanhamilton @richb-hanover Doesn't text.contains solve your issue https://prql-lang.org/book/reference/stdlib/text.html?
Using regexp is expensive and indeed having starts_with, contains and endswith cover the majority of scenarios

@richb-hanover
Copy link
Contributor

@PrettyWood - that appears to be a very good workaround. Here is the use-case that triggered the original report:

qStudio's SQLite implementation doesn't bundle in the REGEXP addin, and therefore gives an error. @ryanhamilton is looking to see if there is another SQLite JDBC that might bundle it.

You are correct that it covers the majority of situations well. I am content to close this issue as "Solved".

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants