-
-
Notifications
You must be signed in to change notification settings - Fork 690
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Defaults in Messages JSON and Messages objects #1519
Comments
Good question. So my thoughts / observations. 100% agree Description on Background - I've never once seen it in 5 years IMO almost any string field in Gherkin should be optional, save for things which come after Gherkin keywords.
|
Broadly agree with @gasparnagy regarding defaults. Specifically my feeling would be:
I think there was a functional reason for this that @aslakhellesoy mentioned the other week but I don't remember what? With descriptions in Scenario, Background et al, I think at runtime for building reports we would be checking whether there is a non-blank string (i.e. has content other than whitespace) so I don't think we'd be "saving" a check there by defaulting to an empty string. TypeScript-specific aside: if we were to drop the idea of defaults on deserialisation, the TypeScript classes generated from the JSON schema could just become interfaces, and we'd no longer have the weight of those classes in the compiled JavaScript (though this saving is probably an order of magnitude less than the one from losing the protobufjs runtime). |
Personally I have an aversion to nulls in general: they breed What's important to me is the semantics of the protocol - an empty string or array and a null mean different things, and we should be clear about which meaning we want to give. So I think we might need to look at each particular field in the messages where we might need to handle nulls, not just string/array/numbers in general. From the semantics perspective. I don't see the problem with representing "the user did not specify a description for this scenario" with an empty string. It's not like the user could tell us, in their Gherkin document, I want to have a description for this scenario, and I want it to be blank (i.e. empty string) versus I don't want this scenario to have a description (i.e. null). Either they give us a description or they don't. Similarly empty arrays seem like a fine way to represent nothingness in all the cases I can think of - e.g. the tags for a scenario. Numbers, I think are different. Zero is a fairly arbitrary default value, and implies more knoweledge of the answer than might be appropriate. A null value for Them's my thoughts, does that help? |
It looks to me that JSON Schema supports default values. See https://json-schema.org/draft/2020-12/json-schema-validation.html#rfc.section.9.2 |
I think this is easier to reason about if we think about the various roles
If we only had in-process message consumers, we wouldn't need to worry about required fields and default values. There wouldn't be any JSON serialisation or deserialisation. If the producer didn't provide a value for a property, it would either be caught by the compiler (TypeScript), or if the language doesn't prevent assigning null values to fields (Java, not sure about C#) it would likely be caught by a null reference in a test. It's primarily the out-of process message consumers we need to worry about. They should assume that the messages they consume contain garbage. Default valuesWe can reduce the amount of garbage with default values. When we parse a JSON string into an object, any property missing from the JSON would get the default value from the JSON Schema. This is similar to how the protobuf deserialisation works. After deserialisation, any missing property gets the "null" value by default ( We have a choice how we want to deal with default values. We can either mimic the protobuf behaviour (i.e. conventional default values), or we can rely on the I prefer the explicit style since it doesn't require programmers to know about any conventions. It also gives us more fine-grained control at the cost of being a little more verbose in the schema. Required propertiesDeclaring a property as required in the schema offers no guarantees to the consumer that the property will have a value in the JSON. It could however give the consumer the confidence that the deserialised object has a value for all required properties. If we want to avoid consumers littered with conditionals checking if a required property actually has a value after deserialising JSON, we can:
With this in place, consumers can safely assume that required properties will always have a value after serialisation. This means that the deserialisers need to know about the schema. What does this mean in practice?I think we should go over the schemas and apply some rules to decide whether a particular property should be
|
@aslakhellesoy Thank you for the summary. I generally agree with you, except one thing. You said, we anyway cannot "guarantee" that producers will include the required properties. I think this is not that simple. The schema is part of the specification. If we have a schema, that declares a property as "required" that means that by the specification of the cucumber messages format, the users MUST provide that value. Producers can decide to not provide the value of course, but that means their input will not conform the expectations for the message, so the result is somewhat unpredictable. (Showing a nice error message is of course nice.) If we publish the JSON schema than both the producers and the consumers can use it to (automatically) verify if the message fulfills that. This is standard for XML message validation, probably there are libraries in the different platforms where such check can be easily included in the deserialization process (ie. check if the JSON message conforms the schema and only start the deserialization if it does). Based on that, I think we should declare in the schema what we expect and not what we might get. I.e. if a property is not required (e.g. description), than we should not mark it as "required" in the JSON schema, because with that we confuse the ones who want to use the schema as a contract (i.e. the message would fail the schema validation although the implementations would accept it). Because of that, I would NOT have anything as I agree to the explicit mention of the default values (wherever possible), and I think we have a consensus on using empty string for string properties and empty array for lists as default. The only one I puzzle with is the |
@gasparnagy having played around a bit recently with this. Do we think this is still an issue, and if so should we move it to the messages repo? I noticed things like location often are half filled (So not all the props are passed in), and things seem to be working well. Do we think we need additional work here>? |
I haven't checked the JSON schema recently, so I don't know. All I wanted is that if we make a specification about the messages (like a JSON schema) that should be valid for all the JSON messages we send across. So if we don't typically send e.g. |
I want to share my thoughts on this topic, as a kind of discussion forum here.
Defaults in Messages JSON
As I worked a bit with the new messages structure, I would like to bring up (again) the problem of defaults in the messages protocol. Currently I have found 3 cases that were not fully consistent with my understanding:
These are currently set as required in the JSON schema, although they are optional in Gherkin and in many cases they are usually not provided (e.g. I have never seen anyone adding a name or description to a Background, but also Scenario description is rarely used). Forcing ourselves to include a dummy empty setting for them in the messages JSON files seems to be unnecessary for me.
As far as I have understood @aslakhellesoy, the original intention was to populate these properties for dynamic languages, but with TS this does not seem to be necessary anyway.
A similar question could be raised for the empty arrays, but those are more correct semantically.
Location.column is currently set as optional (inconsistency with the others)?
Also I feel that if we make "set everything as required" as a rule for the messages JSON, then if later we introduce a new (optional) setting (and make it required by following our rule) then the old JSON message files will became invalid according to the schema, so it will be harder to introduce non-breaking changes.
So my questions are:
Defaults in Messages objects
If we would make these settings as optional in the JSON schema, we could still think about the default value that we use in the populated message objects if the particular field was not specified. As far as I know JSON schema does not provide an option to explicitly specify the default value, so this can only be an agreement that the different implementations follow.
I think here there are 3 questions:
What do you think?
The text was updated successfully, but these errors were encountered: