-
-
Notifications
You must be signed in to change notification settings - Fork 737
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Should URLSearchParams be used in node.js 8 instead of querystring? #252
Comments
So yes, it sucks that Node.js 8 introduced a bug, though obviously they should own up to it and fix the bug. Just reading the docs for the interface seems to have a few points that make it not easy to use. For example:
But a leading ? has no meaning in the POST payload and should not be stripped, and if that behavior cannot be turned off, it seems to makes that unsuitable for parsing urlencoded bodes. The docs also mention multiple times it is only for URL parsing, which also seems to make it hard to use for body parsing. Can you summarize all the differences between the new URLSearchParams class and the query string class? Understanding exactly hoe they are different would be the place for us to start. |
You make very good points @dougwilson, maybe URLSearchParams is not the way to go. I also don't know if there is an specification link between parsing "application/x-www-form-urlencoded" and url query strings (a quick look at https://url.spec.whatwg.org/#application/x-www-form-urlencoded did not clear that for me, although it was fun to read the note "The application/x-www-form-urlencoded format is in many ways an aberrant monstrosity" 🤣 ) I also thought it was important to open this issue here to help future people stumbling with this bug, as many are likely to look here for body-parsing issues. |
Yea, I was mostly wondering what the implementation differences between those two modules in Node.js core are. I assume that URLSearchParams is not usable by this module, otherwise I'm sure if it was actually a replacement, then "querystring" would have at least been deprecated, but it is not and as far as I can tell they do different things, of which the URLSearchParams is only a replacement for URL parsing, which is not the use-case this module is using querystring for. Some PRs and such in the Node.js repo makes it look like the querystring module was more closely aligned to urlencoded posts (like what body-parser uses it for) and not URLs so instead of making it more complicated, the fixes were rolled into the new WHATWG URL parser and the queryparser module was left as it is to continue to support the other use cases (like bodies). At least, that is what I gathered. Let me know if you know otherwise, but it doesn't look like this is something we can just swap in this module, but instead need to await a fix in Node.js core. |
And in case it wasn't clear: the issue you highlighted in the queryparser is absolutely a bug; a space at the end of a value should not prevent decoding the value; it looks like it may be some kind of bug around string replacement, maybe. |
Always prepending a
Based on my understanding of what body-parser does, I would urge using |
Thanks for the information, @TimothyGu ! Can you explain what the differences are between the classes? Is |
And for anyone coming across this issue, if you would like to take a stab at whatever it would take to use |
And for those looking (trying to put together urlencoded issues here), it would be great if we have to move parsers (like to |
No,
It would not be possible for us to add any options not already in the spec to the
I assume you are talking about decoding/parsing with custom charsets here. Currently, the Node.js core only has Latin-1/UTF-8/UTF-16 decoders built-in. Based on this and WHATWG's stance
the |
Ok, so for those following along looking to make a PR, it sounds from @TimothyGu above that you cannot just drop in the This module already includes |
Correct.
The point of that Node.js PR is that you wouldn't need to plug iconv-lite into search param parsing: we would have our ICU-based implementation that will recognize basically all the encodings a browser recognizes. After the work is done, one should be able to just specify |
Re the charsets, gotcha. The idea, though, is that the charset support is at least consistent across all our body parsers. I assume that the ICU could be plugged into the rest of the parsers, at least? Also need to determine what it would look like to support the versions of Node.js without that ICU encoding and the ones with that, and also not being too familiar with ICU, I assume that means that with the ICU thing everyone's Node.js will have every encoding supported so when body-parser stops using iconv-lite users get all the encodings supported out-of-the-box with this module just like they do today? |
ICU is a piece of software written in C++, that Node.js and V8 are already using to support many Unicode- and internationalization-related things. The PR I linked will make use of ICU to provide a
Yes.
There is a JS-only
Technically, no, as iconv-lite supports some encodings It'll still be a while before the const { URLSearchParams } = require('url');
// sanity check to make sure URLSearchParams is spec-compliant
const hasURLSearchParams = Boolean(URLSearchParams) && new URLSearchParams('a&b&a').toString() === 'a&b&a';
// Preferred alternative to querystring.parse() if hasURLSearchParams
function parse(str) {
// Before an API that doesn't strip `?` is available, always add a `?` when constructing.
// TODO: change when an API that limits maximum pairs is available.
const params = new URLSearchParams(`?${str}`);
const obj = Object.create(null);
params.forEach((val, key) => {
if (obj[key] === undefined) {
obj[key] = val;
} else if (Array.isArray(obj[key])) {
obj[key].push(val);
} else {
obj[key] = [obj[key], val];
}
});
return obj;
} |
Tanks, @TimothyGu very helpful! Re one of my questions above, it was about the following quote in that PR:
I assume that meant that by default, most people will only have the "small-icu" installed, is that right? So then this module will have to use that polyfill you pointed to? |
Yes.
Hmm, I see what the problem is now. The |
In the mean time I've finished up nodejs/node#14151, which when added to a v8.x release should fix the original bug. EDIT: the pull request just landed in Node.js. It will be backported to v8.x in the next release. |
|
@stevenvachon |
@TimothyGu I thought the Node.js implementation was all JavaScript and it's not "slow", correct? |
@stevenvachon The Node.js one is JS and not slow, because it was adapted from the optimized Source: I wrote both parsers. |
I know that you wrote both and that's why I'm asking you 😛 |
blows the dust off I was trying to match the behavior of Is it time to revisit this discussion? |
Just to bump this again now node 16 is out - |
Node >= 8.0 introduced an issue in querystring.parse (see here nodejs/node#13773) that will affect all users of body-parser. It occurs when there is a trailing whitespace in a query string parameter:
Before
require('querystring').parse('a=%20+&')
was{ a: ' ' }
In node 8.x
require('querystring').parse('a=%20+&')
is{ a: '%20 ' }
This applies to any parameter value ending with a space.
Even though it is not body-parser's fault, this issue can cause non trivial problems that are very hard to track down. By updating any app that uses body-parser on older node versions to node 8.x, any url-encoded form input with a trailing space will start to have the encoding of special characters broken. It took us a long time to discover the issue and to find out where the problem was. In the issue mentioned above, it is suggested that the new URLSearchParams class is a faster way of parsing query strings and more similar to native browser implementations. Does it make sense to consider it?
The text was updated successfully, but these errors were encountered: