Extract vocabularies from the specs #1510

gregsdennis · 2024-05-22T09:26:59Z

Relates to https://github.com/orgs/json-schema-org/discussions/724 and #1505.

Depends on #1518
Depends on #1512
(will likely need rebase after those merge)

The Core spec was a lot, but it was fairly straightforward.

This PR will make a best effort to include vocabularies as they are in 2020-12 as a start. Some redesign is inevitible, but this this is a good starting point.

jdesrosiers

I've only skimmed a little so far, but one thing that stood out is that we don't necessarily need to remove the word or the concept of vocabularies generally as collection of keywords. We only need to remove the mechanism ($vocabulary) and anything that refers to the behavior of that mechanism. The spec used the term "vocabulary" before the vocabulary system was introduced and I think it would be fine if we leave a few places that use the term in the same way. It could help make this a smaller change.

gregsdennis · 2024-05-22T19:42:55Z

Yes, the word was present, but it was used very loosely. Later we added it with a very specific (and vaguely different) meaning but didn't address the existing usages. (See https://json-schema.org/draft-07/json-schema-validation.)

I figure it's better for now to just remove it completely. I don't think that the text suffers.

gregsdennis · 2024-05-22T21:25:41Z

jsonschema-validation.md

@@ -295,7 +284,7 @@ the name of a property in the instance.

 Omitting this keyword has the same behavior as an empty object.

-## Vocabularies for Semantic Content With `format` {#format}


There are considerable impactful changes for format. Specifically, schema authors will no longer have the ability to specify that format should be validated. Tool configuration is the only way to validate format when we remove vocabularies.

See discussion.

(Might not be an issue if #1520 goes through.)

jdesrosiers · 2024-05-24T14:36:58Z

I figure it's better for now to just remove it completely. I don't think that the text suffers.

That's fine with me. I didn't think the text suffered, I was just trying to make this a smaller job 😄.

gregsdennis · 2024-06-18T10:27:33Z

Reminder (to me): open a new issue to discuss the future of the keyword and link all of the issues I've been linking ☝️

jdesrosiers

This is big, so I'm going to review in several parts. This first part is for jsonschema-core.md.

I like the wording changes. I especially like that works whether the vocabulary system is a thing or not which will make it easier to merge back in at some point if it ever reaches a stable state.

jsonschema-core.md

jdesrosiers · 2024-06-19T18:08:47Z

jsonschema-core.md

+Meta-schemas are used to inform an implementation how to interpret a schema.
+Every schema has a meta-schema, which can be explicitly declared using the
+`$schema` keyword.


I think this section was effectively the introduction to and definition of the term "dialect". That ends up getting lost in the changes. Do we want to do something to include that defines "dialect"?

I'm not sure. I wonder if that's something that needs to be re-introduced with the proposal.

I think the concept of a dialect should stay. The concept isn't dependent on the vocabulary system. I explain further in my latest review.

How do you want to define "dialect"? Is it merely the collection of keywords? Open to suggestions.

jdesrosiers

Here's another partial review for the jsonschema-validation.md changes.

I think the format section still needs a little more work. I bet we can reduce the amount of changes necessary here by introducing an additional "mode" for evaluating format that people can configure. I'm thinking something like annotation-only, best-effort-assertion, and full-assertion. annotation-only would be equivalent to the format-annotation vocabulary. best-effort-assertion would be equivalent to the current configuration enabled behavior. full-assertion would be equivalent to the format-annotation vocabulary.

jdesrosiers · 2024-06-19T18:32:15Z

jsonschema-validation.md

 parties, schema authors SHALL NOT expect a peer implementation to support such
-custom format attributes. An implementation MUST NOT fail to collect unknown
-formats as annotations. When the Format-Assertion vocabulary is specified,
-implementations MUST fail upon encountering unknown formats.
+custom format attributes.


I think that given the changes in this section, we would always expect implementations to "support" custom format attributes, we just wouldn't expect them to validate to anything other than true. Since partial and even no validation is allowed for any format attribute, always returning true is a valid way to "support" that attribute.

I think the point of this sentence is to say that you can't expect it to assert the way you expect, but that ends up being the case for any format attribute including the ones defined in the spec. So, I'm not sure what we should do with this.

The point might be moot with #1520.

Maybe it's enough to say that for unknown formats, an implementation must collect the value as an annotation and provide a passing validation result.

How does #1520 makes this irrelevant? Isn't that just about what the default behavior is? We still have multiple behavior that we have to define in consistent way.

Maybe it's enough to say that for unknown formats, an implementation must collect the value as an annotation and provide a passing validation result.

That would work.

Maybe this is the solution here, too?

jdesrosiers · 2024-06-19T18:34:44Z

jsonschema-validation.md

-this keyword, it is RECOMMENDED to define additional keywords in a custom
-vocabulary rather than additional format attributes if interoperability is
-desired.
+An implementation MUST NOT fail to collect unknown formats as annotations.


We've been qualifying this with "(if the implementation supports annotation collection)" everywhere. Should we be doing that here as well?

Possibly. I think this is a candidate for "things that need cleaning up later".

jdesrosiers · 2024-06-19T18:41:52Z

jsonschema-validation.md

+When configured for assertion behavior for `format`, implementations MUST fail
+upon encountering unknown formats.


We say that even when assertion behavior is enabled, implementations might still just return true, so why must an implementation fail when encountering an unknown format rather than returning true? Or should they fail if they don't have a best-effort implementation for the format whether it's known or not?

I think this requirement made sense applied to the format-assertion vocabulary, but doesn't really work in this context.

The answer to the question is: which is more acceptable, a false negative or a false positive? I landed on the side of false negative being more acceptable.

I think it's better for an implementation to default to failing a schema that contains a format it doesn't know rather than to pass it. Maybe it's one of those "refuse to process" scenarios?

I wasn't advocating for any particular behavior. I was just pointing out that the text is contradictory.

But, my opinion on the behavior is that in assert mode it should either:

Allow partial/no validation and unknown formats return true; OR

Require full validation and refuse to process unknown formats

Allowing no validation and refusing to process unknown formats doesn't make sense.

Do we just remove assertion completely, then, and let implementations figure out what that mode means? (Then separately we handle #1520.)

As long as we define something before we release, I'm fine with leaving this somewhat undefined for now.

I think I want to leave this for #1520 to figure out. For now, I just want to pull out the vocabulary language. We'll definitely address it before release, though.

jdesrosiers · 2024-06-19T18:46:18Z

jsonschema-validation.md

@@ -969,40 +883,6 @@ draft-bhutton-json-schema-01, June 2022,
 Hoehrmann, B., "Scripting Media Types", RFC 4329, DOI 10.17487/RFC4329, April
 2006, <<https://www.rfc-editor.org/info/rfc4329>>.

-## [Appendix] Keywords Moved from Validation to Core


I'm not sure removing this section is related to extracting vocabularies, but I fully agree with removing it.

Hm... I'm not sure. It probably got caught up from one of my more aggressive change experiments I have locally.

Still, I agree it can be removed. I'm not fussed about removing it in this PR if you're not.

jdesrosiers

Here's the last part of my review including the adr and proposal.

jdesrosiers · 2024-06-20T22:48:47Z

proposals/vocabularies-adr.md

+Vocabularies are external documents that describe how new keywords function.
+They can be in a specification style, or a blog post, or some other format.


We've always defined vocabularies as a collection of keywords. I'd call the document that describes a vocabulary the vocabulary specification. I prefer the original definition. Also, I don't think this one is accurate for the vocabularies we define in the spec. Those wouldn't be "external" documents.

That's fair. I'll massage this.

The vocabs that I've created (and the OpenAPI one) are definitely in the specification style, but I remember discussing it with Henry, and he mentioned that it didn't always have to be. It just had to be defined somewhere people who wanted to implement it had access to. That could be a blog post or an internal wiki or whatever.

I think that allows for wider participation in defining new keywords than writing a spec-like document would because most people wouldn't bother with the precision.

jdesrosiers · 2024-06-20T22:55:05Z

proposals/vocabularies-adr.md

+An implementation declares support for a particular vocabulary via
+implementation of its keywords and documentation.


I don't think this sentence makes sense. Implementing something isn't an implicit declaration that it as been implemented. An implementation would have to declare that they implemented something in their documentation or something similar. Also, you don't implement documentation, you implement the keywords in accordance with the vocabulary documentation.

Also, you don't implement documentation, you implement the keywords in accordance with the vocabulary documentation.

... "via (implementation of its keywords) && documentation"

😆 It makes so much more sense now. Still, that sentence could be reworked so it's harder to misunderstand.

jdesrosiers · 2024-06-20T23:17:11Z

proposals/vocabularies.md

+Two new concepts, vocabularies and dialects, will be introduced into the Core
+specification.


The concept of dialects wasn't removed from the spec and I don't think it needs to be. The concept of a dialect exists independently of vocabularies. It's a set of keywords and semantics associated with a distinct flavor of JSON Schema. That can be declared using the vocabulary system, but it doesn't have to be. For example, I'd call both OpenAPI Schema 3.0 and 3.1 dialects of JSON Schema. One uses the vocabulary system and the other doesn't, but in the end, the result is the same, a set of keywords and semantics.

I'd describe the vocabulary system as a way to formally declare a dialect.

jdesrosiers · 2024-06-20T23:25:00Z

proposals/vocabularies.md

+}
+```
+
+A dialect is the set of vocabularies listed by a meta-schema. It is ephemeral


A dialect is the set of vocabularies listed by a meta-schema.

If we turn this around, it aligns with my understanding of what a dialect is.

Suggested change

A dialect is the set of vocabularies listed by a meta-schema. It is ephemeral

The set of vocabularies listed by the meta-schema defines a dialect. It is ephemeral

jdesrosiers · 2024-06-20T23:41:48Z

proposals/vocabularies.md

+A dialect is the set of vocabularies listed by a meta-schema. It is ephemeral
+and carries no identifier.


It is ephemeral and carries no identifier

The way I've always understood it is that the meta-schema URI is the dialect identifier. More accurately, the meta-schema uses the dialect identifier as it's identifier because the dialect identifier is independent of the meta-schema. The $schema keyword serves two purposes. The first is to identify the dialect of the schema, the second is to identify a meta-schema that can be used to validate the schema according to that dialect.

Even if the implementation is reading the meta-schema in order to bootstrap support for the dialect, the dialect URI still identifies the set of vocabularies that the meta-schema declares.

https://github.com/json-schema-org/json-schema-spec/pull/1510/files#r1648263398 is definitely informing your comment here.

Maybe. And certainly it's influenced by the architecture of my implementation to some degree, but it also comes from the spec Section 8.1.1

The "$schema" keyword is both used as a JSON Schema dialect identifier and as the identifier of a resource which is itself a JSON Schema, which describes the set of valid schemas written for this particular dialect.

jdesrosiers · 2024-06-20T23:46:46Z

proposals/vocabularies.md

+_**NOTE** It is possible for two meta-schemas, which would have different `$id`
+values, to share a common dialect if they both declare the same set of
+vocabularies._


I'm not sure I'd say it like that. I would consider two dialect that declare the same set of vocabularies to be two distinct dialects that just happen to be semantically and syntactically identical.

Just like I'd consider these two different interfaces even thought they're identical.

interface A { foo(n: int) => int } interface B { foo(n: int) => int }

I think dialects are more of a duck-typing scenario.

I think this is splitting hairs, but I can remove this and open an issue to decide how we want to define it. I think it's ultimately just up to us. There is probably prior art in the industry that argues either way.

jdesrosiers · 2024-06-20T23:54:48Z

proposals/vocabularies.md

+schema is being validated by a meta-schema that includes the vocabulary. (A
+vocabulary schema is not itself a meta-schema since it does not validate entire
+schemas.) To facilitate this extra validation, when a vocabulary schema is


We've always defined a meta-schema to be a schema that describes a schema. These vocabulary schemas do fit that definition. I understand what you're trying to say here, but I don't think saying it's not a meta-schema is the right approach. Didn't Henry update the spec at some point with a way to describe this behavior without having to say it's ignored or it's not a meta-schema?

He didn't update the spec or anything. This actually came out of reviewing @jviotti's book and trying to rework my vocab schemas. I wrote about it here, and we pulled out $vocabulary from them in #1460, #1461, and #1462.

They're not meta-schemas because they don't themselves describe full schemas; they are used by meta-schemas as components.

I don't think we're quite talking about the same thing. I entirely agree that vocabulary meta-schemas shouldn't include the $vocabulary keyword.

They're not meta-schemas because they don't themselves describe full schemas they are used by meta-schemas as components.

This distinction is what doesn't sit right with me. The way I see it, a component of a meta-schema is still a meta-schema. Section 8.1.2.2 is what I thought you were referring to here. Is that correct?

It does seem like the wording there is not considering schemas referenced by a meta-schema a meta-schema, but that's never how I've understood the word or how we've defined the word. I've always used the terms dialect meta-schema and vocabulary meta-schema. The $vocabulary keyword is only meaningful when the meta-schema is used as a dialect meta-schema.

No, I wasn't referring to that section.

I'm looking at Core 4.3.4 which defines "meta-schema":

A schema that itself describes a schema is called a meta-schema.

A vocabulary schema doesn't describe a schema, therefore it's not a meta-schema.

You wouldn't use the Meta-data vocab schema as a meta-schema; you reference it from a meta-schema.

Interesting. We're using the same definition, but interpreting it differently. The way I see it, a vocabulary schema is validating a schema. It validates the syntax of keywords in a schema. I don't think the definition implies that a schema is only a meta-schema if it describes the entire dialect used by the schema.

It is validating keyword usage inside of a schema, but it is not validating a schema. It's a small distinction, but I think it's worth noting.

Calling the vocabulary schema a "meta-schema" is what led me to create my first vocab schemas as meta-schemas, which multiple people recognized as the wrong way to do it. I didn't understand why until Juan started to dig into it for his book. I realized my mistake and started calling it a "vocab schema" from then on. If I got confused by this, others doubtlessly will (or have already) as well. I just want the distinction to be apparent that this isn't used the in the same way that a full meta-schema would be.

A meta-schema is something that you'd reference with $schema. You wouldn't put a vocabulary schema ID in $schema; its purpose is different.

I had the same realisation while implementing vocab support - which leads to another question. Do we need to publish a canonical vocab metaschema schema?

Do we need to publish a canonical vocab metaschema schema?

I'm not sure what you're leading toward. Are you questioning the current practice of defining a vocab schema? Or are you saying that we should have a vocab schema that can function as a meta-schema?

jdesrosiers · 2024-06-20T23:59:16Z

proposals/vocabularies.md

+In short, if a schema uses a keyword from an unknown _optional_ vocabulary, the
+implementation cannot proceed because unknown keywords are explicitly
+disallowed. However, not being able to proceed with evaluation is the behavior
+prescribed for _required_ vocabularies. Thus, if the behaviors for required and
+optional vocabularies is the same, then the boolean value is moot, which
+highlights that the structure of `$vocabulary` needs to be reconsidered.


I suggest leaving this part out and leaving it at referencing the discussion. There are at least two perspectives on this and if we present one of them we should be presenting all of them. I don't think this is the place for that, so I suggest just dropping this paragraph for now.

jdesrosiers · 2024-06-21T00:10:40Z

proposals/vocabularies.md

+Moreover, such a system cannot provide metadata about the keywords. As such, the
+user must explicitly ensure that the implementation recognizes and supports the
+vocabulary, which isn't much of an improvement over the current state.


That statement doesn't make sense to me. I don't see how the vocabulary description document has any effect for users other than reducing the work they need to do when adding support for their vocabulary in the library they're using.

There was always an intent to build in some kind of machine-readable vocbaulary meta-data document. What stalled that is that we couldn't identify what that meta-data should be.

I'm merely mentioning that the current solution doesn't address that intent, and whatever we come up with should. I think it's fine to keep in here for now.

The last sentence is just stating that the current state of having to check if a vocab is supported is no different than having to check if explicit keywords are supported. Having a machine-readable description document could automate some of that in some cases and ease the burden of the user.

I'm not disputing that vocab description document wasn't always the intention or that it doesn't have value. It's very valuable and I'm 100% on board with having one. I see it enabling all kinds of things. It's just the last sentence that doesn't make sense to me. I don't see how a vocabulary description document can influence the need for a user to check if an implementation supports a vocabulary or not.

The idea was that an implementation could potentially "read in" a vocabulary and support it dynamically. I think in practice this would be limited to vocabs which only defined annotation-only keywords, but that was the thought.

…own to an implementations as 'recognized'

Co-authored-by: Jason Desrosiers <[email protected]>

gregsdennis · 2024-06-21T04:52:01Z

I've rebased this on top of #1512. Starting to address comments now.

gregsdennis changed the title ~~remove vocabularies from the specs~~ Extract vocabularies from the specs May 22, 2024

jdesrosiers reviewed May 22, 2024

View reviewed changes

gregsdennis commented May 22, 2024

View reviewed changes

gregsdennis force-pushed the gregsdennis/extract-vocabs branch from 0e9d125 to a3d2698 Compare May 22, 2024 23:26

gregsdennis force-pushed the gregsdennis/extract-vocabs branch from a3d2698 to c96c349 Compare June 10, 2024 01:10

gregsdennis marked this pull request as ready for review June 11, 2024 01:16

gregsdennis requested a review from a team June 11, 2024 22:45

gregsdennis added this to the stable-release milestone Jun 11, 2024

gregsdennis self-assigned this Jun 11, 2024

This was referenced Jun 18, 2024

Clean-up of stable features and removal of unstable features #1444

Open

Describe what could be at the other end of a $vocabulary URI #1105

Closed

Relequestual added the agenda For the OCWM agenda label Jun 18, 2024

gregsdennis mentioned this pull request Jun 18, 2024

Simplify the processing of "$vocabulary" #1281

Closed

jdesrosiers reviewed Jun 19, 2024

View reviewed changes

This was referenced Jun 20, 2024

Wild idea for vocabs #1523

Open

Proposal: Make format validate by default #1520

Open

jdesrosiers reviewed Jun 21, 2024

View reviewed changes

gregsdennis added 5 commits June 21, 2024 16:39

remove support for unrecognized keywords

4cfe452

update unrecognized section to include all keywords which could be kn…

e3a5aa2

…own to an implementations as 'recognized'

remove vocabularies from core

90f1484

remove vocabulary usages from validation

46d2dec

start on proposal (with adr)

a908f18

gregsdennis and others added 4 commits June 21, 2024 16:50

tweaks to problem statement

cbaa513

add vocabulary proposal doc

3c5ff2a

Update jsonschema-core.md

599487b

Co-authored-by: Jason Desrosiers <[email protected]>

Update jsonschema-core.md

bc542ee

Co-authored-by: Jason Desrosiers <[email protected]>

gregsdennis force-pushed the gregsdennis/extract-vocabs branch from 8faa17e to bc542ee Compare June 21, 2024 04:51

gregsdennis added 2 commits June 21, 2024 17:46

addressing PR comments

651f22d

forgot to save a file before committing

1bc2f80

This was referenced Jun 21, 2024

Specification Flow could be improved (but what do we work on next?) #1527

Open

Reorganizing the meta-schemas #1159

Open

		@@ -295,7 +284,7 @@ the name of a property in the instance.

		Omitting this keyword has the same behavior as an empty object.

		## Vocabularies for Semantic Content With `format` {#format}

		When configured for assertion behavior for `format`, implementations MUST fail
		upon encountering unknown formats.

		Vocabularies are external documents that describe how new keywords function.
		They can be in a specification style, or a blog post, or some other format.

		An implementation declares support for a particular vocabulary via
		implementation of its keywords and documentation.

		Two new concepts, vocabularies and dialects, will be introduced into the Core
		specification.

	A dialect is the set of vocabularies listed by a meta-schema. It is ephemeral
	The set of vocabularies listed by the meta-schema defines a dialect. It is ephemeral

		A dialect is the set of vocabularies listed by a meta-schema. It is ephemeral
		and carries no identifier.

Extract vocabularies from the specs #1510

Are you sure you want to change the base?

Extract vocabularies from the specs #1510

Conversation

gregsdennis commented May 22, 2024 • edited Loading

jdesrosiers left a comment

Choose a reason for hiding this comment

gregsdennis commented May 22, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jdesrosiers commented May 24, 2024

gregsdennis commented Jun 18, 2024

jdesrosiers left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jdesrosiers left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

gregsdennis Jun 19, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

gregsdennis Jun 19, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

gregsdennis Jun 20, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

gregsdennis Jun 19, 2024 • edited Loading

Choose a reason for hiding this comment

jdesrosiers left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

gregsdennis Jun 21, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

gregsdennis commented Jun 21, 2024

gregsdennis commented May 22, 2024 •

edited

Loading

gregsdennis Jun 19, 2024 •

edited

Loading

gregsdennis Jun 19, 2024 •

edited

Loading

gregsdennis Jun 20, 2024 •

edited

Loading

gregsdennis Jun 19, 2024 •

edited

Loading

gregsdennis Jun 21, 2024 •

edited

Loading