-
Notifications
You must be signed in to change notification settings - Fork 62
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
"Persisted queries" / "stored operations" #38
Comments
@n1ru4l points out a more correct term is actually "stored documents" |
We are currently building GraphQL Yoga v3 plugins for Automatic persisted queries and also traditional built-time extracted persisted documents. I see Automatic persisted queries as an upstream client-to-origin payload size optimization and stored/persisted operations/documents as a security feature for preventing the processing of arbitrary GraphQL documents. It seems like all urql and apollo-client have built-in support for Automatic persisted queries (which itself does also not have a real specification) and the "extensions-protocol" became the de-facto standard for both APQ and persisted document implementations (see mercurius). However, we would rather prefer to use something for persisted operations that actually follows a specification. IMHO the GraphQL over HTTP specification would be perfect for this. In enisdenjo/graphql-ws#188 (comment) there has been a hint that this could simply be solved by introducing an |
I may not be able to make it to the October 2023 meeting, but I'd like to supply my comments for the suggested Persisted Operations RFC. I'll provide comments from two viewpoints: as a GraphQL.NET maintainer, and as a user. Note: see @benjie 's comments here As a GraphQL.NET maintainer...As a GraphQL.NET maintainer, the suggestion of storing the .NET is strongly typed, and GraphQL.NET includes a type that matches the form of the GraphQL request, having four properties: The existing APQ code is configured by the user within this request chain, which intercepts the request, either (a) responding directly for invalid doc IDs, (b) pulling the query from the cache if the doc ID is known, or (c) passing through requests that do not contain APQ metadata. Note that this APQ middleware is entirely separate from GraphQL.NET Server, as it can be applied to any transport - most notably, WebSocket protocols. GraphQL.NET also supports caching the parsing and validation steps of queries. This is done independently of APQ and works for both APQ queries and non-APQ queries. This is just another step in the configured execution chain. This all works well as Apollo's APQ specification stores the APQ metadata within the Second, GraphQL.NET caches the original query when APQ is in use so that it can be used for error messages pursuant to the GraphQL specification for the Third, @benjie wrote:
I'm assuming this refers to the two Apollo feature sets, both of which I believe are implemented the same at the protocol level. As such, I believe this comment lies outside the discussion here. Please correct me if I'm wrong. As a GraphQL user...Please keep in mind that the following are personal opinions, largely based on my specific use case: The specification indicates that the hash should be comprised of the query "stripping ignored tokens". This is a major problem for my e-commerce site as it requires that the client contain code to parse and recreate a GraphQL Document. GraphQL's beauty is in its simplicity. All browsers, even IE, support native JSON parse and stringify operations. And a GraphQL request/response comprises of a query in the form of a static text string combined with some variables, POSTed to a server, and the response interpreted as JSON. While it is true that many people likely use Apollo's GraphQL client library (which references So the persisted queries specification as written in the current RFC would not be feasible for me, whereas the Apollo APQ specification would only require a small amount of additional client code. Note that all current browsers can natively perform SHA hashes, and can do so within another cpu thread as it is an asynchronous API. So the hash algorithm would have minimal effect on page download / display time. I suppose ideally the hash would be determined at compile-time and embedded directly in to the source code of the client application. I have not investigated the difficulty required to implement this. I also have intranet applications, where these constraints do not exist to the same extent. While using persisted queries should be a net benefit there, they also are expected to be used by users that have corporate high-bandwidth connections, and as such the additional bandwidth required per GraphQL request makes little difference. ConclusionWhile I endorse having a standardized protocol for persisted queries, I would recommend following Apollo's design choices regarding (a) persisted-query metadata stored within |
I appreciate the insights, thank you!
I would like to avoid
I am not married to the idea of stripping ignored tokens/... my main thinking here was that we could both verify that a SHA is valid and that it reduces potentially duplicate operations. Nothing here sais that the tokens should be stripped at runtime, I have seen approaches with
My main reasoning for I guess maybe, after evaluating some of these points, we need to omit APQ for that reason, encouraging folks in a good direction. |
100%; we must not use the We're actually dealing with two concretely different types of request: type GraphQLRequest {
query: string;
variables?: Record<string, any>;
operationName?: string;
extensions?: Record<string, any>;
}
type GraphQLPersisitedDocumentRequest {
documentId: string;
variables?: Record<string, any>;
operationName?: string;
extensions?: Record<string, any>;
} (Note When your server receives a So "persisted documents" is effectively a middleware in front of a GraphQL service - the incoming request is decoded as
I really don't think this would be necessary; if, after reading the above, you still think it is necessary, please could you expand?
We should not strip the tokens; we should hash the raw document. Clients may choose to generate a minimized document and use that as the input to persisted documents; that's entirely up to them, but is not something we should specify.
APQ is an Apollo feature that negotiates the storage of a query as a hash for bandwidth optimization. Persisted documents is a general GraphQL principle that has existed much longer (independent of Apollo; in fact used inside Facebook before GraphQL was even open-sourced as far as I am aware) and has significantly greater benefits - not least in terms of security and cacheability of requests. Relay uses the I'm proposing that we specify persisted documents as a document allowlist; and then we express an equivalent of APQ as an optional extension of that, but noting that it loses a lot of the benefits. In my opinion the vast majority of GraphQL users (everyone whose GraphQL API is for their own apps and websites only; not intended to be used by third parties) should be using persisted documents, and only a subset of the remainder (e.g. GraphQL APIs designed to be used by third parties, such as the GitHub GraphQL API) should be using APQ or something similar.
Please note that RFC documents get merged without much in the way of review; this isn't even officially a Stage 0 proposal yet; it's definitely not "the specification" for persisted documents. Relay specifies an MD5 hash: https://relay.dev/docs/guides/persisted-queries/ I think we would recommend the usage of a prefix; e.g.
"Stripping ignored tokens" should 100% not be a requirement to use persisted operations; any transform that the client wishes to make to the document before hashing it/storing it should be up to them. I should note that in most cases, for "persisted operations" (and, again, not for APQ, which lacks many of the benefits of persisted operations) the hashing of the document should take place at build time, so there is no requirement to include any additional data in your bundle other than the hashes of the operations you're using. For some clients; you don't even need to include the operations themselves any more, just the hashes, so persisted operations could make your bundle smaller!
💯 If we specify APQ, it should be done with significant indication of the downsides of using it. It's a handly bandwidth optimization for public GraphQL APIs, but almost all other APIs should be using persisted documents as an operation allowlist. |
I've written up my first draft of spec edits to accommodate persisted documents: #264 (Note I've also added a definition of a "persisted operation" in that to be a "persisted document" containing exactly one operation - I think persisted operations are the most typical usage today.) |
I'll expand, but it's more of an implementation detail of GraphQL.NET and may not apply to other languages. We can change the design easily enough to accommodate the specification. So far, this is how GraphQL.NET wires up requests: graph TD;
subgraph a [Server]
A[Server deserializes request]
end
A-->|ExecutionOptions instance| C;
subgraph b [Optional middleware]
C[Configure custom validation rules]-->D
D[Handle APQ support]-->E
E[Handle document caching]-->F
F[Configure authentication]-->F2
F2["Start tracing/logging"]
end
F2-->|ExecutionOptions instance| G
subgraph c [Execution engine]
G["Parse (if applicable)"]-->H
H[Validate]-->Execute
end
Execute-->|ExecutionResult instance| Tracing
subgraph d [Optional middleware]
Tracing["Add tracing data / finish logging"]
end
Tracing-->|ExecutionResult instance| Response
subgraph e [Server]
Response[Server serializes response]
end
Middleware is configured as simple functions that accept .ConfigureExecution(async (options, next) =>
{
// record time execution started
var start = DateTime.UtcNow;
// enable metrics
options.EnableMetrics = true;
// execute next step in the chain
var ret = await next(options).ConfigureAwait(false);
// add apollo tracing data to execution result
ret.EnrichWithApolloTracing(start);
// return execution result
return ret;
}); As is the case now, and as noted by @benjie above, the design should be that "'persisted documents' is effectively a middleware in front of a GraphQL service" - and so it would be required that Notice that within GraphQL.NET, the server deserializes the request as it is first received, prior to being passed to any middleware. This allows for various forms of requests, such as |
I agree that if persisted queries are part of the official spec, then it is better that they have their own property within |
@Shane32 Thanks for sharing the details! It does look like you'd need to change GraphQL.NET a little to adopt this. One option would be to add a "request coercion" phase at the front which takes an IncomingRequest (this should be transport agnostic so that it works for websockets/HTTP/etc) and a PartialExecutionOptions and calls a number of "request coercion middlewares" (including the builtin one which just copies things over) which populate the PartialExecutionOptions from the IncomingRequest. That's when you'd do the APQ dance and other related things. Then at the end of that phase you'd coerce PartialExecutionOptions to ExecutionOptions (or raise an exception) and you're good to continue through your normal flow which relies on ExecutionOptions having non-nullable |
Originally posted by @benjie in #260 (comment) :
I would like to comment on the lack of APQ in the existing draft spec: If we create a specification that only includes support for persisted queries and neglects APQ, we run the risk of excluding a significant segment of developers who find persisted queries too challenging to implement in their current setups. This concern is not just theoretical; as pointed out by @benjie recently, most developers today are using APQ over persisted queries in their workflows. Persisted queries often demand substantial modifications to both client and server architectures, such as adding a dedicated query repository. In contrast, APQ offers a simpler and more straightforward path to adoption, which can be especially important for teams with established workflows. As a result, a comprehensive spec should aim to accommodate both approaches to meet the varied needs across the developer landscape. Note that I am one of these users where the net benefits of persisted queries are minimal over APQ, and where proper persisted query support (both on client and server side) would not be worth the effort. APQ support, on the other hand, while having only a minimal benefit, could be easily adopted. I also have a separate application where persisted queries would be a perfect fit -- but it would require a significant amount of time to implement. |
I'm in no hurry to specify APQ in this, not least because Apollo's APQ spec already exists and works fine - we don't need two competing specifications for that. We'd also have to add new top-level fields both to requests and responses (or, arguably, errors) to support it, and I'm not keen to do that without the GraphQL Spec itself either adopting or "reserved keyword"-ing those changes. I'd also like to make sure there is a significant difference between the name of APQ and persisted documents; "automatic" persisted documents makes it sound like you get the benefits of persisted documents but with minimal effort; but actually you lose the most important feature: that it's an operation allowlist. Then people will combine APQ with "disabling introspection" and think that's good enough, when really at best you're just making attackers put in the tiniest bit more effort. On top of that, a good persisted operations setup lets you track which client(s) the operations are coming from, so you know how long you need to support those operations. Don't get me wrong; APQ is great as a bandwidth optimization, it just muddies the meaning of "persisted queries" and means people don't discover this powerful model that has been in use since before GraphQL was open source!
This isn't what I said, exactly. I meant that when telling people about persisted queries, they would find automatic persisted queries and think that's what we were talking about. This is the risk of having names so similar. Perhaps we should call "persisted documents" as an allowlist something like "trusted documents" instead. Actually... That seems like a good name. I'll write something up about that (separate from this discussion).
What would make it take significant time, do you think? From my perspective it seems like codegen would need to write out your documents, then a script would perform a simple hash operation on them and write them to a key-value store. Perhaps if you don't have codegen then adding that step is significant? I tend to just write my trusted documents to my monorepo as |
Isn't this equally true for white-listed persisted query support? I understand the confusion over the name and feature set, and I would tend to agree that separating the features further by calling one "trusted documents" vs "persisted queries" or similar would be beneficial. But 'trusted documents' are already supported using the protocol defined by the APQ spec, so this RFC is already a duplicate of an existing specification. It would seem to me that these two features go hand in hand, and are mostly the same feature operating in two different modes (from a technology/specification point of view).
Well, for better or worse, we have this, geared towards rapid application development:
To have white-listed documents, we would need to:
This is no small undertaking. We have an intranet application and a public web application. For the intranet application, this would have few benefits. For the public web application, it would be a great security feature while also reducing app load time. We don't use Gradle or codegen, unless it's part of the react template. The compiled SPA is delivered as static files via CDN. I should probably investigate codegen. Another solution might be to combine all GraphQL operations into a single document, and identify them within the application by operation name. The document could be stored in a separate file so that the build script can easily upload it to the server as a trusted document. A single |
There are already multiple specifications for persisted queries that work in different (but similar) ways. Relay have produced more than one over the years; here's their latest: https://relay.dev/docs/guides/persisted-queries/ . Apollo has its own one as you mention. Standardizing this so that everyone can use one common, official, spec is beneficial for all; or at least I hope so! As far as I know, there is only one specification for APQ; creating a competing specification that works in a different way is not desirable, and creating one that works in the same way is redundant. Further, we have the Thanks for sharing why you feel that trusted documents would be a lot of effort to take on, it definitely seems I'm spoiled by the ease of doing this kind of thing in the JS ecosystem! I agree the single-document approach should serve you well. |
@benjie I really like the term "Trusted Document," but I disagree with the idea that "Trusted Document = (Strict) Persisted Query." I believe that the most important aspect of a Trusted Document is that it is "trusted," and it doesn't necessarily have to be a (Strict) Persisted Query to achieve that. (Currently, (Strict) Persisted Query is the most common way to achieve "trusted," but other methods aren't mainstream yet.) Your blog is also very insightful: If you don't mind, I’d like to make a proposal: Among the existing Persisted Queries, those that are "trusted" should be called Strict Persisted Queries. |
Do you have an example of a trusted document that isn’t a persisted query in mind? |
@benjie It’s a mechanism where we attach message authentication using a shared key to a query to ensure that the query was created by internal developers. We call it a Signed Query. Example: {
"query": "query XXXX { ... }",
"extensions": {
"signedQuery": {
"signature": "<HMAC>"
}
}
} Previously, you mentioned something like "until it gets leaked..." in response, which is indeed true to some extent. However, we believe that as long as it is built securely, this approach can also be considered trusted. |
So at client build time you sign each document with a shared secret; and then bundle the documents and signatures (but not the secret!). Then you send the signature+document combo to the server, and the server can validate that the signature is trusted using the shared secret, and thus the document is permitted. This seems a valid approach; it loses some of the advantages of persisted queries such as network optimization, but gains others. I see what you're saying; i.e. the desire to indicate |
@benjie I agree with the idea of keeping names concise and not introducing too many new terms. However, in that case, wouldn’t it be convenient to have a term or category that can encompass both Trusted Documents (Persisted Queries) and Signed Queries (Signed Documents) as a single implementation pattern? These serve a distinct purpose in security protection, setting them apart even more from simpler GraphQL or APQ (Automatic Persisted Query) setups. You might think such a grouping doesn’t need a specific name, but I would love to hear your thoughts on this! |
I’ve been referring to the more general pattern as a “Document Allowlist”; does that work for you? |
The terms "persisted queries" and "stored operations" are typically used interchangably. The traditional name is "persisted queries", but the term "stored operations" acknowledges that this approach can be used with mutation and subscription operations in addition to queries.
There's lots of different implementations of this feature, often with different aims. Sometimes it's used to block all but a small allow-list of operations. Sometimes it's used to reduce network bandwidth for GraphQL requests. Sometimes it's used to improve HTTP cacheability (GET requests with short URLs).
Relevant links:
Some implementations require queries to be persisted to the server in advance of receiving queries (e.g. at build time); others negotiate the storage of the query with the server for bandwidth optimisation (e.g. automatic persisted queries).
Typically persisted queries replace thequery
in{query, variables, operationName}
with another field (typicallyid
) that indicates an identifier for the operation - perhaps a number or hash.Typically stored operations remove the
query
in{query, variables, operationName}
and instead indicate the operation via a hash or similar identifier in theextensions
property.One of the optimisations that stored operations enable is that the
parse
andvalidate
steps of GraphQL execution can be performed once per stored operation and cached; future requests for the same stored operation can jump straight to theexecute
step.The text was updated successfully, but these errors were encountered: