[CORE-3186] schema registry json schema: array compatibility checks #20137

andijcr · 2024-06-25T16:51:21Z

Implements is_array_superset, the function that performs compatibility checks for "type": "array" json schemas.

Note1: "type": "array" can express proper arrays or tuples. Tuples have a validation logic similar to "type": "object" schemas, so the compatibility check, too, has a similar logic.

Note 2: for tuples, the syntax changed between Draft 4 and later drafts. To prevent erroneous validation, "prefixItems" keyword is added to the list of not implemented features.

Last commit fixed an edge case for "min__" properties. these have a default value of 0, so these two schemas
{"type": "string"} and {"type": "string", "minLength": 0 } should be considered compatible (equivalent, in fact)

Backports Required

Release Notes

none

no functional changes

"type": "array" can model tuples, and in doing so in Draft4 it uses "additionalItems" the same way as "additionalProperties". After Draft4, "items" (in tuple mode) works in the same way as "additionalProperties". this commit adds an enum that it's used to parametrize the field name

the "uniqueItems" keyword makes most sense for array type, but it's legal also for tuples, so it's checked in the common section of is_array_superset

find if a schema is modelling an array by checking if "items" is a schema or an array of schemas. implement superset check if both are array schemas

tuple schema check is more involved: each element in newer["items"] must be compatible with the element at the same index of older["items"]. additionally, if newer["items"] has more elements than older["items"], the excess elements must be compatible with older["additionalItems"] it's a similar process as "type": "object" adds "prefixItems" to the list of uninplemented keywords: this is from drafts after Draft4, and tuple schema is implemented differntly so this checks protects against that

the check would not work for numeric properties with a default value, in the edge case where on one side the property is explicitly set with the default value, and on the other side the property is left out. for example {"type": "string", "minLength": 0} and {"type": "string"} should be considered equivalent adds a parameter to set the default value if the property is not set

andijcr · 2024-06-28T09:01:55Z

Failure is #20315

BenPope

Nice

BenPope · 2024-06-28T09:30:38Z

src/v/pandaproxy/schema_registry/json.cc

+    // get value or default_value
+    auto get_value = [&](json::Value const& v) -> std::optional<double> {
+        auto it = v.FindMember(
+          json::Value{prop_name.data(), rapidjson::SizeType(prop_name.size())});


It might be worth creating a helper from std::string_view to StringRef to highlight the lifetime and hide the cast.

pgellert · 2024-06-28T09:10:23Z

src/v/pandaproxy/schema_registry/test/test_json_schema.cc

+    .reader_schema = R"({"type": "array", "minItems": 2, "maxItems": 10})",
+    .writer_schema = R"({"type": "array", "maxItems": 11})",
+    .reader_is_compatible_with_writer = false,
+  },


nit: is it worth splitting this into 2 tests, one that changes minItems and another that changes maxItems to show that neither change is allowed (independently of one another)?

good point, i'll follow up

pgellert · 2024-06-28T09:51:33Z

src/v/pandaproxy/schema_registry/json.cc

+  VPred&& value_predicate,
+  std::optional<double> default_value = std::nullopt) {
+    // get value or default_value
+    auto get_value = [&](json::Value const& v) -> std::optional<double> {


I think this only fixes the bug for the numeric properties but the same bug exists for other calls to extract_property_and_gate_check. If I understand it correctly, the bug is that extract_property_and_gate_check doesn't handle the default value of the field.
For example, this test case would fail - I believe incorrectly:

{ .reader_schema = R"({"type": "number", "minimum": 11, "exclusiveMinimum": false})", .writer_schema = R"({"type": "number", "minimum": 11})", .reader_is_compatible_with_writer = true, },

You are right, but extract_property_and_gate_check cannot handle this case cleanly.
"exclusiveMinimum" (or _maximum) is kind of tricky because it's a boolean with a default value in draft4 but a number without a default value from draft6.
Currently, is_numeric_superset doesn't know the schema dialect, so it cannot assign a meaningful default to the property.
We have json schema normalization on the roadmap that should fix this problem at the root, by explicitly setting all the implicit default values based on the schema dialect in use.
So if you are ok with this, i'd move this fix to a follow-up or to the json schema normalization PR

pgellert · 2024-06-28T09:55:49Z

src/v/pandaproxy/schema_registry/test/test_json_schema.cc

+    .reader_schema = R"({"type": "array", "uniqueItems": true})",
+    .writer_schema = R"({"type": "array"})",
+    .reader_is_compatible_with_writer = false,
+  },


nit: is it worth adding a test here for the default value handling?

yes, i'll followup

andijcr · 2024-06-28T17:19:04Z

list of failures:
https://buildkite.com/redpanda/redpanda/builds/50841#01905bc0-ab10-4f08-b09a-60addd1cac28
#18014

https://buildkite.com/redpanda/redpanda/builds/50841#01905d8f-b142-4c5b-b9fb-ae2df4ec46fa
https://buildkite.com/redpanda/redpanda/builds/50841#01905d2a-e7f8-4dd5-a20e-329b5a50f732
https://buildkite.com/redpanda/redpanda/builds/50841#01905bd2-64e1-413d-b218-d9d435a04be7
#20315

https://buildkite.com/redpanda/redpanda/builds/50841#01905bd2-64e1-413d-b218-d9d435a04be7
#20574 (fixed after this failure)

github-actions bot added the area/redpanda label Jun 25, 2024

andijcr added 2 commits June 27, 2024 16:48

schema_registry/json: move is_object_additional_properties_superset

701189e

no functional changes

andijcr force-pushed the feat/core-3186/schema_registry_array branch from ff21fa8 to e455b96 Compare June 27, 2024 16:26

andijcr added 5 commits June 27, 2024 22:46

schema_registry/json: is_array_superset size checks

83962e5

schema_registry/json: is_array_superset uniqueItems

4c9789c

the "uniqueItems" keyword makes most sense for array type, but it's legal also for tuples, so it's checked in the common section of is_array_superset

schema_registry/json: is_array_superset array schema

04edb4f

find if a schema is modelling an array by checking if "items" is a schema or an array of schemas. implement superset check if both are array schemas

andijcr force-pushed the feat/core-3186/schema_registry_array branch from e455b96 to 3c9596f Compare June 27, 2024 20:46

andijcr changed the title ~~[CORE-3186] Feat/core 3186/schema registry array~~ [CORE-3186] schema registry json schema: array compatibility checks Jun 27, 2024

andijcr marked this pull request as ready for review June 27, 2024 20:55

andijcr requested review from BenPope and a team June 27, 2024 20:58

BenPope approved these changes Jun 28, 2024

View reviewed changes

pgellert reviewed Jun 28, 2024

View reviewed changes

pgellert approved these changes Jun 28, 2024

View reviewed changes

BenPope added the area/schema-registry Schema Registry service within Redpanda label Jun 28, 2024

michael-redpanda merged commit 14c8aa4 into redpanda-data:dev Jun 28, 2024
17 of 21 checks passed

andijcr deleted the feat/core-3186/schema_registry_array branch June 28, 2024 17:20

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[CORE-3186] schema registry json schema: array compatibility checks #20137

[CORE-3186] schema registry json schema: array compatibility checks #20137

andijcr commented Jun 25, 2024 •

edited

Loading

andijcr commented Jun 28, 2024

BenPope left a comment

BenPope Jun 28, 2024

pgellert Jun 28, 2024

andijcr Jun 28, 2024

pgellert Jun 28, 2024

andijcr Jun 28, 2024

pgellert Jun 28, 2024

andijcr Jun 28, 2024

andijcr commented Jun 28, 2024

[CORE-3186] schema registry json schema: array compatibility checks #20137

[CORE-3186] schema registry json schema: array compatibility checks #20137

Conversation

andijcr commented Jun 25, 2024 • edited Loading

Backports Required

Release Notes

andijcr commented Jun 28, 2024

BenPope left a comment

Choose a reason for hiding this comment

BenPope Jun 28, 2024

Choose a reason for hiding this comment

pgellert Jun 28, 2024

Choose a reason for hiding this comment

andijcr Jun 28, 2024

Choose a reason for hiding this comment

pgellert Jun 28, 2024

Choose a reason for hiding this comment

andijcr Jun 28, 2024

Choose a reason for hiding this comment

pgellert Jun 28, 2024

Choose a reason for hiding this comment

andijcr Jun 28, 2024

Choose a reason for hiding this comment

andijcr commented Jun 28, 2024

andijcr commented Jun 25, 2024 •

edited

Loading