Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ast: directive-prologue #8219

Closed
Boshen opened this issue Jan 2, 2025 · 5 comments
Closed

ast: directive-prologue #8219

Boshen opened this issue Jan 2, 2025 · 5 comments
Assignees
Labels
C-bug Category - Bug

Comments

@Boshen
Copy link
Member

Boshen commented Jan 2, 2025

https://tc39.es/ecma262/#directive-prologue

A Directive Prologue is the longest sequence of ExpressionStatements occurring as the initial StatementListItems or ModuleItems of a FunctionBody, a ScriptBody, or a ModuleBody and where each ExpressionStatement in the sequence consists entirely of a StringLiteral token followed by a semicolon. The semicolon may appear explicitly or may be inserted by automatic semicolon insertion (12.10). A Directive Prologue may be an empty sequence.

Oxc AST does not conformance to this definition.

Oxc produces

directives: [ Directive { .. } ]
body: [  ]

it should be at least

directives: [ Directive { .. } ]
body: [ ExpressionStatement { .. } ]
@Boshen Boshen added the C-bug Category - Bug label Jan 2, 2025
@Boshen Boshen self-assigned this Jan 2, 2025
@overlookmotel
Copy link
Contributor

function f() {
  'use strict';
}

Are you saying that the AST for this function's body should look like this?

FunctionBody {
  directives: [
    Directive { StringLiteral("use strict") },
  ],
  statements: [
    ExpressionStatement {
      expression: Expression(StringLiteral("use strict")),
    },
  ],
}

If so, in my opinion, I don't think we need to do that in order to remain compliant with the spec.

My reading of the passage you linked to is that:

  • It describes the Directive Prologue as a thing which a parser needs to recognise.
  • It describes what code comprises the Directive Prologue.
  • It states that the engine needs to identify a Use Strict Directive within the Prologue, and process code within the function as strict mode code.

But what it doesn't say is how the Directive Prologue needs to be stored internally in the engine. All it says is:

A Use Strict Directive is an ExpressionStatement in a Directive Prologue

So I think we are compliant. FunctionBody::directives is the Directive Prologue, and we store directives in it. The spec doesn't say that directives also need to be stored elsewhere.

OK, we're storing directives as Directive rather than ExpressionStatement. But there seems little point in using ExpressionStatement(Expression(StringLiteral)) since the expression is statically known to be a StringLiteral.

Sorry if I've completely misinterpreted what you're saying and have got confused!

@aapoalas
Copy link

I agree that oxc isn't actually breaking with the spec here. The relevant part of the spec is this:

The ExpressionStatements of a Directive Prologue are evaluated normally during evaluation of the containing production. [1]

This isn't really directly relevant to parsers, only for engines themselves. And since oxc does expose this data, everything is technically fine and correct.

The only question is whether this is a good API or not. First, from an engine point of view I need to add a conditional path for this (though that's not too bad, really) and it's a fairly obscure thing to discover. From an AST transformer point of view, the same obscurity might become a real source of bugs: Any transformer should remember to check and copy the directives (at least a use-strict directive) during their transform separately from the normal AST work. If they forget it, the code they produce may end up deoptimised or even subject to sloppy-mode related security vulnerabilities.

From a memory point of view I wonder if this makes sense either: You need a whole extra Vec for the directives. If the directives were at the beginning of the body, you could replace the Vec with a single u8 that tells how many directives there are at the beginning.

@overlookmotel
Copy link
Contributor

From a memory point of view I wonder if this makes sense either: You need a whole extra Vec for the directives. If the directives were at the beginning of the body, you could replace the Vec with a single u8 that tells how many directives there are at the beginning.

This is a good point. Most functions have no directives, so we are using 32 bytes in every FunctionBody for a Vec which is almost always empty. An alternative would be a ThinVec type (8 bytes) for vecs like this which are usually empty (oxc-project/backlog#113). OK, that's 8 bytes instead of 1 byte, but it gets us most of the way. And would a u8 be sufficient for "number of directives" anyway? It's legal (though very weird) to have any number of directives.

There are also other convoluted ways to save space e.g. store directives in a HashMap indexed by NodeId. But we're trying to strike a balance between compact memory representation and simplicity of design. So if we want to save on memory, personally I prefer the ThinVec approach due to its relative simplicity. Also, on the memory usage front, we have considerably bigger fish to fry! (oxc-project/backlog#18, oxc-project/backlog#91)

Note: The fastest way to determine whether some code is strict mode or not is to check the ScopeFlags::StrictMode flag of the current scope. I don't know if you're using Semantic in Nova?

@overlookmotel
Copy link
Contributor

TLDR: Personally, I prefer it as it is now. If we agree that it's spec-compliant, then the question (as you say) is what's the best API. Currently we represent directives as a type called Directive, in a field called directives. You can't get much clearer than that!

@Boshen
Copy link
Member Author

Boshen commented Jan 19, 2025

Changing the AST shape is too interruptive at this point. Nova can workaround this limitation with an additional code path. Close as not planned :-(

@Boshen Boshen closed this as not planned Won't fix, can't repro, duplicate, stale Jan 19, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
C-bug Category - Bug
Projects
None yet
Development

No branches or pull requests

3 participants