Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fixed #24638 -- Added support for SQL comments #18258

Draft
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

viciu
Copy link
Contributor

@viciu viciu commented Jun 8, 2024

This PR reactivates work done in #4495 and #15711 - useful addition to Django for adding traceability for queries.

Trac ticket number

ticket-24638

Branch description

Add queryset comment() method

  • Rebase against 5.x
  • Support SELECT statement
  • Support UPDATE statement
  • Support INSERT statement
  • Support DELETE statement
  • use regex instead if "*/"

Checklist

  • This PR targets the main branch.
  • The commit message is written in past tense, mentions the ticket number, and ends with a period.
  • I have checked the "Has patch" ticket flag in the Trac system.
  • I have added or updated relevant tests.
  • I have added or updated relevant docs, including release notes if applicable.
  • I have attached screenshots in both light and dark modes for any UI changes.

@viciu viciu changed the title Fixed #24638 -- Add support for SQL comments in SELECT and UPDATE que… Fixed #24638 -- Add support for SQL comments Jun 8, 2024
Copy link
Sponsor Member

@adamchainz adamchainz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great start, looking forward to your iteration tomorrow.

The PR description formatting needs a little fixing.

django/db/models/sql/compiler.py Outdated Show resolved Hide resolved
django/db/models/sql/compiler.py Outdated Show resolved Hide resolved
@viciu viciu force-pushed the 24638-query-comments branch 2 times, most recently from 19932f2 to 3d75c28 Compare June 9, 2024 09:58
@viciu viciu changed the title Fixed #24638 -- Add support for SQL comments Fixed #24638 -- Added support for SQL comments Jun 9, 2024
.. method:: comment(message)

Adds an SQL comment to be inserted into the resultant SQL statement, after
``SELECT``/``UPDATE``, and if applicable, ``DISTINCT``.. For example::
Copy link
Sponsor Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it not possible to make it work for INSERT, at least when using Model.objects.comment(...).create(...) ?

docs/ref/models/querysets.txt Outdated Show resolved Hide resolved
docs/ref/models/querysets.txt Outdated Show resolved Hide resolved
docs/releases/5.2.txt Outdated Show resolved Hide resolved
@viciu viciu force-pushed the 24638-query-comments branch 4 times, most recently from fd32c25 to 076612e Compare June 9, 2024 14:51
django/db/models/sql/compiler.py Outdated Show resolved Hide resolved
django/db/models/sql/compiler.py Outdated Show resolved Hide resolved
with CaptureQueriesContext(connection) as captured_queries:
list(NamedCategory.objects.comment("! STRAIGHT_JOIN"))
list(NamedCategory.objects.comment("").comment("some comment"))
self.assertIn("SELECT /* ! STRAIGHT_JOIN */ ", captured_queries[0]["sql"])
Copy link
Sponsor Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use startswith in the tests for more accuracy

Suggested change
self.assertIn("SELECT /* ! STRAIGHT_JOIN */ ", captured_queries[0]["sql"])
self.assertTrue(captured_queries[0]["sql"].startswith("SELECT /* ! STRAIGHT_JOIN */ "))

Also, I see you're testing with an example of a MySQL-specific ! comment. But that would require no space before ! to work, so it would be /*! STRAIGHT JOIN */. See the docs: https://dev.mysql.com/doc/refman/8.0/en/comments.html .

We agreed not to support any such comments in the old PR ( #15711 (comment) ), so let's drop any references to them in the tests. Let’s instead have test data like the intended use case, like blog/views.py:123. (This applies to all the tests. I find the COUNT / DISTINCT tests a bit weird to read, using the keywords.)

Comment on lines +4654 to +4661
for comment in ["foo=/a/b/c*/", "/*foo=/a/b/c", "**//SELECT nothing;//**"]:
msg = (
"Cannot pass strings containing /* or */ to comment(). "
"Escape or strip these delimiters before calling comment()."
)

with self.subTest(comment), self.assertRaisesMessage(ValueError, msg):
NamedCategory.objects.comment(comment)
Copy link
Sponsor Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can hoist msg and split the subtests one per line, to make it easier to read and add any more in the future.

Suggested change
for comment in ["foo=/a/b/c*/", "/*foo=/a/b/c", "**//SELECT nothing;//**"]:
msg = (
"Cannot pass strings containing /* or */ to comment(). "
"Escape or strip these delimiters before calling comment()."
)
with self.subTest(comment), self.assertRaisesMessage(ValueError, msg):
NamedCategory.objects.comment(comment)
msg = (
"Cannot pass strings containing /* or */ to comment(). "
"Escape or strip these delimiters before calling comment()."
)
for comment in [
"foo=/a/b/c*/",
"/*foo=/a/b/c",
"**//SELECT nothing;//**",
]:
with self.subTest(comment), self.assertRaisesMessage(ValueError, msg):
NamedCategory.objects.comment(comment)

Comment on lines +4665 to +4670
NamedCategory.objects.create(name="test")
NamedCategory.objects.comment("DELETE 1").all().delete()

NamedCategory.objects.create(name="test")
NamedCategory.objects.comment("DELETE 2").all().delete()
self.assertIn("DELETE FROM /* DELETE 1 */", captured_queries[1]["sql"])
Copy link
Sponsor Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. We don't need to create any data in order to run DELETE
  2. Only one delete query needed
  3. Use startswith and more realistic example data
Suggested change
NamedCategory.objects.create(name="test")
NamedCategory.objects.comment("DELETE 1").all().delete()
NamedCategory.objects.create(name="test")
NamedCategory.objects.comment("DELETE 2").all().delete()
self.assertIn("DELETE FROM /* DELETE 1 */", captured_queries[1]["sql"])
NamedCategory.objects.comment("blog/views.py:50").all().delete()
self.assertTrue(
captured_queries[0]["sql"].startswith(
"DELETE /* blog/views.py:50 */ FROM",
)
)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The problem here is that delete does not generate delete query, but select:

This

NamedCategory.objects.comment("DELETE 2").delete()

will fail:

FAIL: test_delete (queries.tests.SqlCommentsInQueriesTests.test_delete)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/home/viciu/prog/django/tests/queries/tests.py", line 4670, in test_delete
    self.assertIn("DELETE FROM /* DELETE 1 */", [q['sql'] for q in captured_queries])
AssertionError: 'DELETE FROM /* DELETE 1 */' not found in 
['SELECT /* DELETE 2 */ "queries_dumbcategory"."id", "queries_namedcategory"."dumbcategory_ptr_id", "queries_namedcategory"."name" FROM "queries_namedcategory" INNER JOIN "queries_dumbcategory" ON ("queries_namedcategory"."dumbcategory_ptr_id" = "queries_dumbcategory"."id")']

therefore I was experimenting with adding some records first and then deleting them. If I understand correctly it looks like django fetches records to delete first, then updates related records with nulls, then finally deletes them one by one (including related records):

FAIL: test_delete (queries.tests.SqlCommentsInQueriesTests.test_delete)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/home/viciu/prog/django/tests/queries/tests.py", line 4670, in test_delete
    self.assertIn("DELETE FROM /* DELETE 1 */", [q['sql'] for q in captured_queries])
AssertionError: 'DELETE FROM /* DELETE 1 */' not found in 
[
'INSERT INTO "queries_dumbcategory" ("id") VALUES (NULL)', 
'INSERT INTO "queries_namedcategory" ("dumbcategory_ptr_id", "name") VALUES (1, \'test\')', 
'SELECT /* DELETE 1 */ "queries_dumbcategory"."id", "queries_namedcategory"."dumbcategory_ptr_id", "queries_namedcategory"."name" FROM "queries_namedcategory" INNER JOIN "queries_dumbcategory" ON ("queries_namedcategory"."dumbcategory_ptr_id" = "queries_dumbcategory"."id")', 
'UPDATE "queries_tag" SET "category_id" = NULL WHERE "queries_tag"."category_id" IN (1)', 
'DELETE FROM "queries_namedcategory" WHERE "queries_namedcategory"."dumbcategory_ptr_id" IN (1)',
'DELETE FROM "queries_dumbcategory" WHERE "queries_dumbcategory"."id" IN (1)', 

'INSERT INTO "queries_dumbcategory" ("id") VALUES (NULL)', 
'INSERT INTO "queries_namedcategory" ("dumbcategory_ptr_id", "name") VALUES (2, \'test\')', 
'SELECT /* DELETE 2 */ "queries_dumbcategory"."id", "queries_namedcategory"."dumbcategory_ptr_id", "queries_namedcategory"."name" FROM "queries_namedcategory" INNER JOIN "queries_dumbcategory" ON ("queries_namedcategory"."dumbcategory_ptr_id" = "queries_dumbcategory"."id")', 
'UPDATE "queries_tag" SET "category_id" = NULL WHERE "queries_tag"."category_id" IN (2)', 
'DELETE FROM "queries_namedcategory" WHERE "queries_namedcategory"."dumbcategory_ptr_id" IN (2)', 
'DELETE FROM "queries_dumbcategory" WHERE "queries_dumbcategory"."id" IN (2)']

@adamchainz
Copy link
Sponsor Member

adamchainz commented Jun 9, 2024

Thanks for putting sprint time in on this! I found a bit more time here to review, hope this helps.

hannseman and others added 2 commits June 10, 2024 00:50
…ueries.

Co-authored-by: Adam Johnson <[email protected]>
Co-authored-by: Wiktor Kolodziej <[email protected]>
- Moved wrapping to comments in QuerySet.comment()
- Removed comments_sql()
- Added re.search instead of use if */
- WIP INSERT and DELETE
Copy link
Member

@charettes charettes left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I understand the desire to help workaround MySQL'ism but I find it strange that we default to injecting comments where it can be useful for the latter on all backends given the main purpose of these is likely to be traceability which is harder to parse when it's right in the middle SQL.

Could we have some backends level option that at least configure where they should live and default to prepending / postpending them on non-MySQL/Oracle backends?

delete = "DELETE FROM %s" % self.quote_name_unless_alias(query.base_table)
result = [
"DELETE",
*self.query.comments,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the rationale for injecting comments here instead of at the end of the query?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for your comment. Queries can be long and truncated - please refer to the comment #15711 (comment)

Copy link
Sponsor Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, the idea is that in a running query list, you can see the type of query first, then any tracing info. I saw this recommended by MySQL experts years ago, hence my suggestion for it in the previous PR. This is not for supporting any MySQL specific options.

I'm not so sure now that we should try to support inserting comments at any of the positions used for query hints. That would require a very complex API and might still not cover all cases. Also, database instrumentation exists to allow rewriting queries when needed.

Copy link
Member

@charettes charettes Jun 10, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I saw this recommended by MySQL experts years ago.

Do you have references for that? Isn't it fair to assume that an approach that works well for MySQL and recommended by these experts might not work so well for other backends (e.g. Postgres). I've seen a few solutions default prepending the traces to queries to ensure they are never truncated and APMs requiring that the comment be pre or postpended.

At work we built a solution that postpends commented traces specifically for MySQL and it appears to be working well hence my curiosity here.

Copy link
Member

@charettes charettes Jun 10, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From an internal discussion with DBREs

  1. Some tooling doesn’t support parsing in the middle of a query
  2. Keeping queries starting with their respective verb makes it easier to filter
  3. Putting the trace at the end of query might cause it to be truncated off on some systems

This makes me think that a proper solution would need to be configurable (PREPEND, POSTVERB, POSTPEND)?

django/db/models/sql/compiler.py Show resolved Hide resolved
@pgramaca
Copy link

Let me know if there is something i can help with.
I would love to help with this feature.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
6 participants