-
-
Notifications
You must be signed in to change notification settings - Fork 863
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
json
parameter now accepts a bytes
instance for pre-serialized json.
#1730
Conversation
Appealing in some ways, but also a bit awkwardly nuanced, given that Python can often be a little careless between when For example, these two examples look pretty identical on the surface... import httpx, json
httpx.post(url, json=json.dumps(data)) import httpx, ojson
httpx.post(url, json=ojson.dumps(data)) But would actually result in completely different behaviour. (One case happens to output a |
It's a good point. class SerializedJSON(bytes):
pass
httpx.post(url, json=SerializedJSON(json.dumps("asdsad"))) This is a real blocker for us so I'm a bit pushy :p |
Okay, noted. So... this PR only addresses the request encoding, but wouldn't alter Also, do you have any kind of metrics that've informed this use case. Eg. how large are the json documents that you're posting? Do you have a good demo example that shows that the serialization is significant vs. the actual network time it takes to upload? Typically the network would be the limiting factor anyways, but if we've got some good evidence that it's not the case for some reasonable use-cases, then that'd be a valid reason to prioritise #1352 / #717. |
We have dictionaries sized 100-500 kilobytes. Those take about 10x the time using regular json, and also converting the complex object to a dictionary by itself takes a huge amount of time, rather then the serialization that does it on the fly. I think any benchmark of dataclass in orjson is a good example for our performance gain from using it. |
Very possibly. Although it is relevant to know if the JSON serialisation speed is significant vs. the actual network sending itself. Typically the network speed is the actual limiting factor, and improvements in serialisation speed might look relevant if measured in isolation, but might not necessarily be relevant in the bigger picture. I'm not asserting that is or isn't the case here, but I haven't seen any definitive evidence either one way or the other. A good test example might be to measure a large JSON request being posted to the (I'm absolutely open to being persuaded here, I'm just pushing back until we've really made a the case for it.) |
Regarding import json
import orjson
from dataclasses import dataclass, asdict
@dataclass
class A:
a: str
@dataclass
class B:
a: A
b: A
c: A
b = B(a=A(a="Asd"), b=A(a="asd"), c=A(a="lasd"))
%timeit orjson.dumps(asdict(b))
11.8 µs ± 47.4 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
%timeit json.dumps(asdict(b))
15.6 µs ± 154 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
%timeit orjson.dumps(b)
479 ns ± 2.33 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each) This example shows that it's not about who serializes a dict faster (although orjson does it better) but the fact that serialization happens in one operation, rather then conversion then serialization. |
I actually disagree that serialization shouldn't be a factor on it's own even if it's not significant to the network limit. If you can cut 30 ms from either side, why would it matter? Especially when the 30ms (30ms as an example ofc) is CPU bound and not IO bound and you actually save "CPU money"... |
Here's a benchmark of a local httpbin: from typing import List
import orjson
import dataclasses
@dataclasses.dataclass
class Member:
id: int
active: bool
@dataclasses.dataclass
class Object:
id: int
name: str
members: List[Member]
objects_as_dataclass = [
Object(i, str(i) * 3, [Member(j, True) for j in range(0, 10)])
for i in range(100000, 102000)
]
%timeit httpx.post("http://localhost:9001/post", content=orjson.dumps(objects_as_dataclass))
247 ms ± 5.61 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
%timeit httpx.post("http://localhost:9001/post", json=[dataclasses.asdict(obj) for obj in objects_as_dataclass])
371 ms ± 8.18 ms per loop (mean ± std. dev. of 7 runs, 1 loop each) |
Just to be clear, I'm entirely up for your scrutiny. You're (and the whole encode team) are doing an amazing job in keeping a very high standard set of libraries, and this cannot happen without being strict and following real life examples that are proved and not niche cases, that complicate the codebase. I wish more open source groups would be as professional. |
Fantastic, thanks! Can we also see the following?... %timeit httpx.post("http://localhost:9001/post", json=[orjson.dumps(dataclasses.asdict(obj)) for obj in objects_as_dataclass]) It's interesting to me if the upshot of this is that the primary motivation here is "orjson can directly serialize dataclass instances, which is fast". (Which is a somewhat different motivation to "orjson is faster than json") |
I think the sensible outcome of this is likely to be that I'm not wild about the global approach in #1352, so if we've got a proven-enough use case here, then let's do this properly, right? |
The example you wrote doesn't make sense so I assume you meant for something like this: %timeit httpx.post("http://localhost:9001/post", content=orjson.dumps([dataclasses.asdict(obj) for obj in objects_as_dataclass]))
346 ms ± 9.91 ms per loop (mean ± std. dev. of 7 runs, 1 loop each) I think a jsonlib argument is not sufficient, as different json libraries might have different options (for example, we use the orjson numpy options that can be passed). |
imo the signature should be something like def json_encode(json: Any) -> bytes
def json_decode(data: Union[str, bytes]) -> Any: |
The docs on https://github.com/encode/httpx/pull/1352/files#diff-0f0c40de44a0841ed5180f16d66212d1cece2a7d4400789007c2f73bf8432622 detail how you'd be able to handle that. (Tho #1352 proposes a global approach) |
I'll be a bit picky and say that it'll force double encoding.. but ok 😅 |
So, all of the proposed options still leave me with awkward niggly-uncomfortablness. I've also got the seed of some thoughts in #1740 which again is pushing me away from wanting to add extra API in this area. Right now, the way I'd currently work with your use-case is probably something like this... class CustomClient(httpx.Client):
def request(self, *args, **kwargs):
if 'json' in kwargs:
kwargs['content'] = orjson.dumps(kwargs.pop('json'))
kwargs.setdefault('headers', {})
kwargs['headers']['Content-Type'] = 'application/json'
super().request(*args, **kwargs) It's also interesting having talked through your requirements here that the primary benefit |
This seems like a good solution, though I think it would be better if the |
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions. |
I know, this might be outdated but I thought I can add one more use case here. |
This is a take on #1352 #717 . I think it offers a pretty good way to provide easy functionality, without global hacks.