Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove Vec requirement #392

Open
wants to merge 3 commits into
base: master
Choose a base branch
from

Conversation

therealbnut
Copy link

@therealbnut therealbnut commented Dec 12, 2024

Hi,

I've wanted to be able to store repeated values into a structure of my choosing, instead of a Vec.

This change does that, without needing a temporary intermediate Vec. I believe it will also improve performance as expressions like $(expression ** delim) will currently create a temporary Vec.

Two examples from the tests of how this may improve some usage:

-        = v:( "a" / "\n" )* { v.len() }
+        = v:( "a" / "\n" )* {
+            // Does not store the items.
+            let counter: ItemCounter = v;
+            counter.count()
+        }
     pub rule keyvals() -> HashMap<i64, i64>
-        = kvs:keyval() ++ "\n" {
-            kvs.iter().cloned().collect::<HashMap<i64, i64>>()
-        }
+        = kvs:keyval() ++ "\n" { kvs }

Currently this works by requiring the collection the values are stored into to implement Default and Extend. It may sometimes be necessary to be explicit about the type like the ItemCounter example above. It also may not be able to infer the type of the Extend<T>'s T if it implements it for multiple types, in which case the new-type pattern might be necessary. I don't imagine that will be common though.

Another pattern that might be useful is to be able to instantiate the collection type with some context. I'm not sure of a good way to do this at the moment. I looked into returning an iterator, but you still need to be able to evaluate whether parsing fails before the iterator is evaluated.

@therealbnut
Copy link
Author

therealbnut commented Dec 12, 2024

Instead of Default it may be useful to have our own trait which can be instantiated from the rule parameters (if there are any). Examples of where this might be useful is for something like string interning, or putting the AST in contiguous memory storage for cache efficiency. I'm not sure how you'd achieve this without something like specialisation though. I'll think on this more.

@kevinmehall
Copy link
Owner

I'm concerned about the reliance on type inference here. It's definitely a breaking change (which isn't necessarily a problem; there will be an 0.9). Especially because the workaround for type inference failure requires introducing a variable in the rule action, which is rather ugly.

I'm wondering if it would be better to make it explicit with something like x()* as HashSet or maybe even x()* in (context.new_list()). I used parentheses on the latter because the rust-peg metagrammar has a rule covering Rust's type expressions, but Rust value expressions are more daunting to parse, so all Rust value expressions in the grammar are wrapped in () or {} so rustc's parser is responsible for delimiting them.

For ** or ++, not sure if it would want to be written as x() ** as HashMap "," vs x() ** "," as HashMap. Somewhat awkward to have an operator with 3+ params.

Using extend(Some(x)) to do push via a std trait is kind of clever.

@therealbnut
Copy link
Author

therealbnut commented Dec 26, 2024

I'm wondering if it would be better to make it explicit with something like x()* as HashSet or maybe even x()* in (context.new_list()).

One alternative I’ve been considering is having it expressed at the rule level. This makes the type explicit on the rule definition.

A possible way to express this is if the context has the same name as part of the rule:

rule keyvals(kvs: &mut HashMap<i64, i64>) -> 
         = kvs:keyval() ++ "\n"

This would extend into the kvs context rather than creating a new variable. I haven’t checked yet, but hopefully it’s an error to shadow parameters like this currently, and so wouldn’t be a breaking change anymore.

@kevinmehall
Copy link
Owner

What's the use case where the caller constructs the container? It seems like it would be more common for the rule to want to construct and return it.

And I believe shadowing is currently allowed. Variable binding just expands to let or nested match, and Rust allows shadowing.

@therealbnut
Copy link
Author

The use case is the same as the rest of the PR. I was imagining it might be a cleaner way to express it.

Instead of your suggestion like this:

rule xs() = x()* in (context.new_list())

From the caller you would do:

xs(context.new_list())

@therealbnut
Copy link
Author

therealbnut commented Dec 26, 2024

To clarify, the use case is to be able to store it in custom collections, and use things like memory pools and flattened/contiguous storage. Although it was more about keeping the rule syntax simple and flexible than it was about the responsibility of the caller/callee.

I would expect that if you aren’t passing in the thing to be extended then it would use the old behaviour. Although that does make it cumbersome in a similar way as the type inference. It’s an idea I’m still thinking through.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants