-
Notifications
You must be signed in to change notification settings - Fork 4.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[API Proposal]: Access Regex group without allocations #73223
Comments
Tagging subscribers to this area: @dotnet/area-system-text-regularexpressions Issue DetailsBackground and motivationIn class API Proposalnamespace System.Text.RegularExpressions;
public class Match : Group
{
public ReadOnlySpan<char> GetGroupValueSpan(int groupnum);
} API Usagestring text = "One car red car blue car";
string pat = @"(\w+)\s+(car)";
Regex r = new Regex(pat, RegexOptions.IgnoreCase);
Match m = r.Match(text);
ReadOnlySpan<char> word = m.GetGroupValueSpan(1); // matches "One" Alternative Designs
RisksNo response
|
What would be the reason for that? The whole point of this new method is to avoid allocations, but then you also add a version that has an extra allocation? Especially since anyone who does need a
This one does sound useful to me. |
Agree! The idea/question mark came from the fact that we provide both |
Thanks for the proposal @ronaldvdv. Not sure if you are aware, but we added some amortized-allocation free APIs that loop through matches in .NET 7, in particular, we added |
Interesting!! Yes, indeed if we have an amortized-allocation free option, it's nice to extend that to include access to individual captures within the match.
|
Assuming you are using the same regex object to call for the different inputs, then yes. When
That is the tricky part and why it wasn't added yet for 7.0. |
Thanks for providing more background! I still struggle a little bit to understand how the approach described in #65011 would help for the original question in this issue. I hope you don't mind me asking a follow-up here. If we would tell It seems to me we would still need an additional method that skips the creation of |
Not at all, questions are always welcomed 😄
Nothing would change with the internal Match object, that would still have the same fields that track capture data, but remember that this Match object gets reused, so we can't rely on it to get the capture groups data. That means that the way to get this info would have to be through the
It would improve since we wouldn't have to return a
Exactly. With this approach you only allocate |
I have commented this as an alternative in #110383 namespace System.Text.RegularExpressions;
public readonly ref struct ValueMatch
{
// only returns successful matches
+ public ValueGroupEnumerator EnumerateGroups();
} The drawback is that this doesn't match the majority of the use cases where you want to collect specific groups. I'd argue with many struct enumerators being added, dotnet/roslyn#66553 would be actually a nice addition to work with these APIs. if (match.EnumerateGroups() is [var first, var second, ..])
if (span.Split(':') is [var first, var second]) However I think it also makes sense to have both APIs next to each other. PS: It could also support named groups through "dictionary/indexer patterns" but I think that's another discussion altogether. |
I just ran into this issue today when trying to handle a large log file parsing. If I want to do any reading of the values that I regexed (which I almost always do... I can't remember the last time I wrote a non-capturing regex), I have no choice but to stringify the entire span I want to search... which in this case means stringifying a gigabyte long log file, which I'd really like to avoid. Or whenever a match is hit, stringify the match and then run the Regex a second time (which in my case still ends up stringifying the whole file AND double running every regex) |
Background and motivation
In class
Capture
(namespaceSystem.Text.RegularExpressions
) we now have the nice addition of theValueSpan
property which allows me to access the captured text efficiently. However, to be able to access that property I would still need to accessMatch.Groups[num]
which would allocate the fullGroupCollection
and allGroup
instances, which (for my specific use case) defeats the purpose a bit.API Proposal
API Usage
Alternative Designs
string
string
Risks
No response
The text was updated successfully, but these errors were encountered: