-
Notifications
You must be signed in to change notification settings - Fork 240
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Using InputStream as :ssl-context :trust-store is buggy #728
Comments
I don't recall ever using an InputStream, but as you can see here, Aleph doesn't do much with it by default. It's interesting that they fail "in quick succession". Maybe there's some delayed init that's not triggered until needed, and then if two conns both try to setup the sslcontext/trust store, they end up racing on the InputStream, and one or both fail. It looks like that happens on the client-side. It makes a new client context for each conn, which is necessary in case the context is actually just a map of options. And if you make a call slowly, the same conn will get reused, so it's not an issue there. But too fast, and it'll spawn multiple conns, corresponding multiple sslcontexts, and try to read from an exhausted stream or in the middle of an earlier conn. Solution is to either (1) force the stream into another format ASAP, or (2) disallow streams. |
@KingMob My reading of the code agrees with your analysis 👍 I'll come up with a test case to reproduce it.
I think this might actually not be necessary: It should be possible to lift the construction of the context to the level of the pool instead which would solve this bug as well as reduce allocation. Will give this a try! |
Yeah, even when SslContext construction is idempotent, why do it multiple times? |
If these instances are not thread-safe, that could be a reason. But since they are immutable, I suppose they also should be thread safe. Furthermore, I think the netty SslContext instances are based on the JDK SslContext implementations and if these weren't thread-safe it would be a widespread common problem (even though I find it hard to find explicit documentation about this). |
@bitti Good point. Though I've never really considered, should thread safety be implicit in the definition of "idempotent"? I assumed so, but we programmers are much looser about the definition than mathematicians. |
I think even in the mathematical sense, you can't 'define' it like that, since thread-unsafety implies undefined behavior. So no, neither idempotency nor immutability implies thread-safety. But I think in this case we can safely assume the JDK/netty implementations are thread-safe since otherwise it would, it make it more difficult to share connections (at least that's what I gather from the SO discussions). I gather the reason why the JDK docs don't explicitly state this is because they can't make a guarantee for the millions of potential |
Both testing contexts are failing. The serial one is to demonstrate that the InputStream cannot be read twice without resetting, which obviously is not done by Netty/Aleph. This is also the case in the concurrent context, which was intended to resemble the original report in clj-commons#728 and is a more likely scenario, since it doesn't disable keep-alive. IIUC, the concurrent scenario could fail in an even more unpleasant way, if the test certificate file was greater than the 8192-byte buffer used to read it, but ours is not (the fix would be the same). NB: `with-http-ssl-servers` already runs things twice, so `repeatedly` is not required to make it fail, but that would be harder to read and wouldn't cover (at some level, at least) both servers.
As suggested by @DerGuteMoritz in clj-commons#728. This fixes the issue and makes the test added in the previous commit pass. Keeping the `client-ssl-context` call in `http-connection` as is, even though it might seem superfluous considering the code path taken in the test, but `http-connection` is a public API, so we have to keep the call (which for us is a no-op, if we ignore the repeated ALPN check) even for our case when the protocol is https and `ssl-context` is supplied. NOTE: This highlights a difference we are introducing here. Previously, if we specified ssl-context, but the protocol wasn't https, we would just ignore the ssl-context. Currently, we are coercing it ahead-of-time, before knowing the request protocol. This could be alleviated by wrapping the coercion in a `delay`, so it won't happen until needed. However, given how unlikely this scenario seems, I have doubts whether it'd be worth it. I slightly dislike the repetition of `[:http1]` default value, but since it server as a documentation in `http-connection`, I decided to keep it as is rather than to extract it out. Also, I slightly dislike the repetition of a pattern to call `ensure-consistent-alpn-config` and then `coerce-ssl-client-context` but it's only now in 2 places, which I think is a better alternative than adding yet another ssl-coercion layer/wrapping function. Obviously, we cannot just move `ensure-consistent-alpn-config` to `ssl-client-context`, since ALPN is only for HTTP.
As suggested by @DerGuteMoritz in clj-commons#728. This fixes the issue and makes the test added in the previous commit pass. Keeping the `client-ssl-context` call in `http-connection` as is, even though it might seem superfluous considering the code path taken in the test, but `http-connection` is a public API, so we have to keep the call (which for us is a no-op, if we ignore the repeated ALPN check) even for our case when the protocol is https and `ssl-context` is supplied. NOTE: This highlights a difference we are introducing here. Previously, if we specified ssl-context, but the protocol wasn't https, we would just ignore the ssl-context. Currently, we are coercing it ahead-of-time, before knowing the request protocol. This could be alleviated by wrapping the coercion in a `delay`, so it wouldn't happen until needed. Yet, given how unlikely this scenario seems, I have doubts whether it'd be worth it. I slightly dislike the repetition of `[:http1]` default value, but since it serves as documentation in `http-connection`, I decided to keep it as is rather than to extract it out. Also, I slightly dislike the repetition of a pattern to call `ensure-consistent-alpn-config` and then `coerce-ssl-client-context` but it's only now in 2 places, which I think is a better alternative than adding yet another ssl-coercion layer/wrapping function. Obviously, we cannot just move `ensure-consistent-alpn-config` to `ssl-client-context`, since ALPN is only for HTTP.
Quoting the original report by @David-Ongaro from the addendum of #727:
The text was updated successfully, but these errors were encountered: