-
Notifications
You must be signed in to change notification settings - Fork 8
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Does re-use of the same MIME types constitute a breaking change? #141
Comments
Thank you for the analysis. We do have https://www.w3.org/TR/rdf12-turtle/#changes-12 and the matter will be in "RDF 1.2 New". The WG is discussing levels of conformance. |
Related: https://www.w3.org/TR/rdf12-turtle/ -- currently, the 1.2 working draft. This will be the REC when published. Title "RDF 1.2 Turtle". |
Interesting; may I suggest a standards-based mechanism for agents to indicate this level? (A media type or profile comes to mind.) Or would “classic conformance” de facto be the same as effectively only parsing the RDF 1.1 subset (in which case it would be equivalent to one of the points above)? However, this does not seem to be the case, with for instance base directions being added to literals (in which case “classic” might be a confusing/misleading term). |
[This is not a WG response] Any approach for versioning can have costs on both the reader-side and the writer-side. For example, anything in the HTTP header to make the data consumer's task easier puts a requirement on the data producer. In the same way that RDF 1.2 syntax can be a long way in the delivered stream for a reader, having an HTTP header with the information makes the writers life harder because it may need to see all the data first - no stream writing without recording the details of which version is in the data, which would also be a producer-side burden. One way to publish data is using a web server support for mapping file extensions to Today, a tool kit may need to "know" to look in the URL to get the content type if no realistic content-type is available. Given the file extension situation, I think any solution will not help RDF that much. Software will want to handle the static/non-profile/file-extension/... cases anyway. Only a domain specific (i.e. consumer and producer) deployment can be sure the global rules are in-play. There is a trade-off of whether the long term continued provision of a migration solution is a greater burden than the evolution itself. Such migration should never be withdrawn -- "The web is not versioned". |
Thanks, @afs. I want to leave space for others so will be brief, but quickly:
|
Maybe an optional version or feature declaration, to support fail-fast detection? With the implicit being "latest REC". It should perhaps be clearly stated that implementations are required to follow the evolution of the format; with the reciprocal requirement of evolving the format responsibly, aspiring to standardize once "sufficient" implementation coverage has been established. AFAIK, there is a requirement of multiple independent implementations; perhaps that number should be a function of the "cardinality of known deployments" and "how viable it is to upgrade them"? (I know it is a practical impossibility to quantify that on the web scale, but it goes to show awareness of the complexity underlying these judgement calls. And that we (W3C members) have a responsibility to care and cater for cooperative evolution to ensure web interop.) I think this follows the conventions @afs referenced, which is a trade-off I'm cautiously in agreement with. Defining a new format (mime-type + suffix) is the only other viable option AFAICS; and while that caters for more overlap in deployments, it also induces a certain inertia and growing technical debt. (When is the previous format "sunset"? How is the data quality impacted during the overlap period? How do applications take the difference in expressivity into account?) I see no practical way around some form of social contracts, as even content negotiation is not merely technical ( |
The key difference being that—for example—HTTP, HTML, and CSS have explicit behaviors on how to deal with unsupported constructs. HTTP proxies have rules on how to deal with unknown headers, HTTP has version negotiation, HTML has rules for unknown tags and attributes, CSS has rules for unsupported properties and even syntax. So the Web's ability to be non-versioned is baked into the design of those technologies. Conversely, RDF adopting the non-versioned philosophy does not equate doing nothing on the feature support/versioning front, but rather being very explicit about how non-versioning is to be made possible. In summary, not doing anything put us on neither a versioned nor a non-versioned trajectory. They are not binary opposites, with the third option “incompatible with versioning and non-versioning” being the unfortunate default choice. |
to take a concrete example as precedent, i do not recall that in the transition from sparql 1.0 to sparql 1.1 was the continued use of the same media type designators was problematic. in what sense other than the concern about "late failure" for large documents should that matter for document media types? |
Apples and oranges.
So the upgrade path of SPARQL is much more similar to that of SQL, with similar challenges and non-issues. And quite a pain in practice: one typically needs to know out-of-band what precise SPARQL endpoint software an interface is running, which determines how well certain SPARQL 1.0 or 1.1 features are supported. In contrast, at least today,
RDF is about enabling interoperability. Yes, on the semantic level, but not having interoperability on the syntactical level precludes that. In the pre-1.1 days, “Turtle” had been around as a format for over a decade, and parsers were incompatible with each other. It was quite the nightmare, trying to exchange data or write parsers. There was no established (let alone standard) way of knowing what subset was supported by everyone. The Turtle standard solved this by bringing certainty about what is and isn't The proposed re-definition of
One might not even know. One could've parsed a 1.2 document wrongly without ever knowing. One could've rejected or accepted a document based on the wrong assumption (because assumptions are all you have, in band). One doesn't know if downstream systems are compatible with 1.1 or 1.2, because they can't tell. It's an absolute interoperability nightmare that systems don't even have the words to express what they do and do not support. In a context where we're advocating for semantic interoperability, failing at syntactic interoperability is a serious flaw from a technical and strategic perspective. It adds a serious degree of brittleness, the details of which only a small group of people understand, which carries a major risk of reflecting badly on RDF as a whole for not being a sustainable—let alone interoperable—technology. People will say that RDF doesn't work reliably across systems, and they will be right. |
that may be, unless one is concerned with sparql processors. |
we agree - vehemently. |
2 cents from someone who did implement a non-standard RDF format that has an analogue of Ruben's proposed I ended up making the serializer always claim that all features are used by default. Then, it's up to the user to tell the serializer that "this and that" feature won't be needed. This creates an obvious compatibility problem, because parsers will simply refuse to read these files, even though in practice the feature may not be used. I have not found a better solution to this problem. I think this is a sensible compromise for my ugly format, but I would be against this in W3C formats. More details here. Overall, I think a sensible solution would be to embrace the mess and just live with the fact that RDF formats can evolve. I would also like to ask the WG to kindly consider producing some "best practices" for how to mark that an RDF file is 1.2, in a use case-specific manner. I like the suggestion from @lisp for adding some info in graph store protocol descriptions. I'm also curious if something like a non-mandatory HTTP header would be an option. Or maybe a comment at the start of the file (like a shebang in .sh files) – of course, entirely optional. (disclaimer: I did not think these ideas through, they may be VERY bad) |
Intuitively to me it sounds like TTL documents that use any of the new features need a new media type and file ending. |
legacy software will not see them. |
Isn't the situation with Turtle 1.1 and Turtle 1.2 a bit like Turtle and TriG? In both cases the former syntax is a subset of the latter. |
Consuming data which is suddenly |
Hi. My thoughts on this from a practicality perspective: I echo Ruben's argument that we should be aiming to support interoperability and backwards compatibility - especially when we know exactly how and why an existing system will break due to new changes. For Turtle, the mime type can be versioned - there is precedent for this if we look at existing mime types. If we don't version the mime type, existing systems will break. They will need to be updated to support turtle 1.2. There is no way to distinguish between turtle 1.1 and turtle 1.2, so there is no way for them to silently fail or ignore turtle 1.2. There isn't also a way to fail with context i.e. failed as it doesn't handle turtle 1.2 - it will fail equally for valid turtle 1.2 and invalid turtle 1.1. So this is not a trivially fixable change. Not desirable IMO. If we do version the mime type, existing systems will not break. If they have to support turtle 1.2, then they MUST change or be updated anyway since turtle 1.2 requires updates anyway, and hence there is an opportunity for these systems to add the mime type handling change alongside the turtle 1.2 handling changes. It might result in some extra work, potentially some complex cases as there is mime type handling. However, we know for sure that existing systems won't break (assuming the mime type is used as intended here), and if they do get an incorrectly assigned mime type then the fix is to use the correct mime type. So this should be the desirable state. This also brings up the question of what should happen when Turtle 1.3 eventually is required. Again versioning the mime type is an option, but pragmatically, having the version in the document itself is the best forwards-compatible solution and a known best practice. It would be ideal to have it here. |
Another important consideration to take into account here is the length of Long accept headers in browsers are problematicThe Fetch spec (CORS section) specifies that each header (including the As an example, the Comunica query engine uses the following
Hence, when we do these requests in a browser, we must splice this New media types exacerbate this problemAs such, I believe introducing new media types for each RDF serialization in 1.2 is not the right way forward. For example, the following (which contains some arbitrary new media types for 1.2) already reaches the limit according to CORS:
And this problem would only get worse for every new RDF version:
Towards a solutionMy initial thought when reading this issue was that profile-based negotiation could be a good solution, From this perspective, my feeling is that new media types or profile-based negotiation are not the way to go, and that in-band solutions such as Not only does this problem apply to RDF serialization, it also applies to SPARQL result serializations: SPARQL/JSON, SPARQL/XML, SPARQL/CSV, SPARQL/TSV. |
Except that again, no established frameworks (e.g. JAX-RS implementations) support it. |
which is why it is better to implement the logic which verifies availability of the required media type on a higher level. |
While that's true for the spec version, I don't think the same can be said for the widespread use of the Team Submission that predates the spec. The same media type was in use for years before Turtle 1.1 was introduced and brought with it changes to the syntax. I'm not sure that's reason to do the same thing again, but this isn't the first time we've been faced with this issue. |
Just felt like pointing out that there’s also an IETF Internet Draft on profile based negotiation, of which @RubenVerborgh is co-author. It’s been in the works for quite a long time. There’s been renewed interest from the cultural heritage community and even from the W3C where some consider this a topic that falls in the IETF realm. See https://datatracker.ietf.org/doc/draft-svensson-profiled-representations/. |
Technologically? Been there, done that. Reputationally? Not so much. |
@RubenVerborgh — You pointed to "Extended discussion at https://ruben.verborgh.org/articles/fine-grained-content-negotiation/" — which included —
First thing, your writing betrays a limited understanding of your topic, as you refer consistently to "MIME types", which are actually "media types", though they are used in a universe of MIME. Next, I bear relatively recent scars of a years-long effort to convince IETF to follow their own documentation and work with a number of folks (including me) who wanted to extend media types by defining how to interpret multiple In other words -- your "extended discussion" (which is really an extended monologue) has been overtaken by events, and is no longer (if it ever was) applicable. |
Summary
In rdfjs/N3.js#484, I learned that the specifications intend to redefine the set of valid documents under the
text/turtle
media type (and presumably others).Such a change might not be possible/desired, or should at least be acknowledged as a breaking change.
Definitions
text/turtle
as the media type defined by https://www.w3.org/TR/turtle/valid-turtle
as the (infinite) set of valid Turtle 1.1 documentsinvalid-turtle
as the (infinite) set of documents that are not invalid-turtle
valid-turtle
, produces the corresponding set of triplesinvalid-turtle
, rejects it (possibly with details on the syntax error)Note here that the above definition includes rejection; the 1.1 specification text does not, its test cases do.
Potential problems
text/turtle
breaks existing spec-compliant Turtle parsers, as they will incorrectly label validtext/turtle
documents as invalid.text/turtle
document and no other context?text/turtle
in the Turtle 1.1 spec, any changes to that set (whether deletions or additions) would contradict the Turtle 1.1 spec itself / make it invalid.Accept: text/turtle
does not tell them. Nor doesContent-Type: text/turtle
tell them whether their parser can handle the contents, and we could be 20 gigabytes in until we notice it doesn't.Analysis
Unlike formats like HTML, Turtle 1.1 does not contain provisions for upgrading. The specification assumes a closed set of valid documents. We find further evidence in a number of bad test cases (https://www.w3.org/2013/TurtleTests/), which explicitly consider more permissive parsers to be non-compliant.
There is a note in the spec (but only a note, and thus explicitly non-normative):
but this non-normative statement is contradicted by the bad test cases, which parsers need to reject in order to produce a compliant report.
Although the considered changes for 1.2 are presumably not in contradiction with those bad cases, the test suite was not designed to be exhaustive. Rather, the 1.1 specification considers
text/turtle
to be a closed set, and the test cases consider a handful of examples to verify the set is indeed closed.In particular, no extension points where left open on purpose.
Therefore, the 1.1 spec is not only defining “Turtle 1.1”, but also strictly finalizing
text/turtle
.(The IANA submission's reservation that "The W3C reserves change control over this specifications [sic]." does not change the above arguments.)
Potential solutions
A set of non-mutually exclusive solutions, which each cover part or all of the problem space:
Factual disagreements with the above.
The introduction of a new media type.
The introduction of a new profile on top of the existing
text/turtle
media type.A change to the Turtle 1.1 spec that adds extension points or otherwise opens the set of
text/turtle
.Syntactical support in Turtle 1.2 for extension and/or versioning.
The text was updated successfully, but these errors were encountered: