Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

New Razor document formatting engine #11364

Merged

Conversation

davidwengier
Copy link
Contributor

@davidwengier davidwengier commented Jan 8, 2025

Fixes #10402
Fixes #10864

This PR moves from the old document formatting engine, to a new one that doesn't use the normal Razor compiler generated C# document, and is therefore able to work regardless of whether the system is configured for runtime or design time code-gen (ie, FUSE or not). Some details about how it works can be seen in the doc comments in CSharpFormattingPass.CSharpDocumentGenerator.cs. The existing engine is still used for On Type Formatting, as well as formatting the results of code actions, OnAutoInsert, snippet completions, etc. This means there isn't as much code removal as I'd like, but I'm hoping to follow up with more PRs in future to address some of this.

Looking at this commit at a time is probably overkill, but would be workable if you prefer it. If not, there are a couple of spots where the final git diff won't do you any favours, and I'll comment on them as appropriate.

As proof of the benefits of the new design, this also fixes #7933 almost by accident, by which I mean by design, but without specific action.
Finally, in a meeting about this engine Andrew asked how this engine would work for lambda attributes, and so I added tests for that, and also accidentally (ie deliberately) fixes #9777

I can't remember what this is for, but a passing formatting test isn't the worst thing in the world
Not sure if its a mark for or against the old engine, that these tests used to pass
These cover additional cases that would have shown up bugs in code as I was writing it
We don't need to do much with the new engine, just stop the couple of places where Html formatting actively breaks Razor
Basic theory is to generate a synthetic C# document, format it, and then interpret the results.
Waiting on the team consensus as to whether this is feature flag worthy
… formatter

This class could do with a complete clean up, but I'm saving that for a follow up once we know if the old engine is still needed in parallel
The approach to implicit and explicit expressions couldn't handle them if they weren't at the start of a line, so these tweaks set us up for a different approach where that is handled.
Fixing these was not intentional, but probably speaks to the value of the new approach avoiding the generated C#
Each of these cases is an objective win, making Razor formatting behave the same way as Roslyn formatting.
Arguably the old behaviour is better, but given these are invalid Razor documents, it doesn't seem to me to be worth worrying about. Also this could improve with better recovery behaviour in the compiler.
This is one of those situations where Roslyn simple doesn't indent anything, so we have to do a little bit more work to set things up for a better chance of success.
@davidwengier davidwengier requested a review from a team as a code owner January 8, 2025 05:04
},
Data = Model.WorkOrders,
Title = "Work Orders"
};
}
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These changes are a result of the new engine fixing #7933

{
First = 1,
Second = 2
};
}
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These changes are a result of the new engine fixing #7933

{
First = 1,
Second = 2
};
}
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These changes are a result of the new engine fixing #7933

Copy link
Member

@DustinCampbell DustinCampbell left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the helpful notes!

_builder.AppendLine();
_builder.AppendLine(additionalLinesBuilder.ToString());

return SourceText.From(_builder.ToString());
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perf note: This will create a string on the LOH for sufficiently large files. Consider whether there's something better to use here than a StringBuilder so a TextReader can be passed to SourceText.From. That could be captured in an issue for later work, of course.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Definitely planning on perf being a separate PR. The old engine used to re-compile the Razor file 3 or 4 times, so we get a huge perf win from just not doing that, that it didn't seem worth complicating this PR. There are definitely some low hanging fruit, though I'd probably want to fix on-type formatting first, since that also recompiles Razor files a lot (and each time, the compiler produces the entire generated C# file as a string!).

I'll capture this and the blow comment in a new issue though, so thanks for the hint.

Comment on lines 63 to 65
var indentationString = formattedCSharpText.ToString(new TextSpan(formattedLine.Start, formattedIndentation))
+ htmlIndentString
+ lineInfo.AdditionalIndentation;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perf note: Consider having a table indentation strings for common cases to avoid allocating a new string for each line.

@davidwengier
Copy link
Contributor Author

Apologies, this PR was a bit of a bet towards putting this in without a feature flag, but we decided today that we do want a feature flag, so this PR is going to get a little noisier. The final diff should actually be simpler though.

@davidwengier davidwengier requested a review from a team as a code owner January 10, 2025 03:46
@davidwengier
Copy link
Contributor Author

Apologies for the diff, I didn't rebase but I may as well have considering the whole new formatting engine has moved (back) to a separate file. The final diff is reasonably clean though, only 37 files changed, and all feedback has been addressed.

The new engine is now behind a feature flag, and is off by default.

@davidwengier davidwengier removed the request for review from a team January 10, 2025 03:47
Copy link
Contributor

@ryzngard ryzngard left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:shipit:

public static CSharpFormattingDocument Generate(RazorCodeDocument codeDocument, RazorFormattingOptions options)
{
using var _ = StringBuilderPool.GetPooledObject(out var builder);
using var lineInfo = new PooledArrayBuilder<LineInfo>(capacity: codeDocument.Source.Text.Lines.Count);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: lineInfoBuilder

private TextLine _currentLine;
private int _currentFirstNonWhitespacePosition;

// These are set in GetCSharpDocumentContents so will never be observably null
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

❓ why are these lazily initialized? I would think they realistically are always going to be created.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think of these as lazy, they're just fields by necessity. _currentToken tracks the current token being processed, but there is no way to pass that in to the Visit methods. As such, there simply isn't a default value that makes sense to intiialize it to, so I used null!.

_builder is owned externally so it can come from a pool, without worrying about leaking pooled objects. Doing that was based off a suggestion from Dustin earlier in the PR, and I think it makes sense.

It's a bit of a compromise that they have to exist, but unless we bring dotnet/roslyn#55183 over to Razor, I don't know another way. I'm all ears if you have a suggestion though?

Though having written all of that, I think maybe I've misunderstood your question/comment?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think my brain scanned quickly. The StringBuilder being null and set later is really the oddity and I assumed below field(s) followed suit.

Looking at https://github.com/dotnet/razor/pull/11364/files/2e3adc7ab43a1deb9ed07684d745f69f8f50c8e9#r1911738908 I think that's potentially better representing things: this class is a bit of an oddball still.

Having SyntaxVisitor<TArgument, TResult> could help, but for lifetime of StringBuilder etc couldn't that be solved by making this disposable? and/or a ref struct since this never needs to escape this method or go to the heap.

Maybe viewing this class similar to a stream operation would be helpful. Thinking while typing: it operates on lines in a forward only manner. It generates SourceText and LineInfo after "consuming" the stream. You have a Current which is the current TextLine, some hardcoded metadata about formatting options, and a pointer back to the original SourceText. Maybe separating the enumeration and the generation would clean things up?

This is definitely not blocking and more just bouncing ideas. Can be left alone or done in a follow up

Copy link
Contributor Author

@davidwengier davidwengier Jan 11, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Definitely oddball. I wished for ref struct at one point too, but the class inherits from SyntaxVisitor, so that can't work. I like the stream idea, but again, I think its limited by SyntaxVisitor (the enumerator would have to set a bunch of private state fields on the generator). I do wonder if that is something that is worth changing. It's not really much of a visitor, since it absolutely can't just visit an arbitrary tree and it almost never actually uses the visitor pattern properly, but it was very useful for it to be one during development. Being able to type override and have completion show me which node types were available, unhandled candidates was great.

Maybe now that development is finished (lol!), it would be more sensible to simply have a method that has a giant switch expression, which would allow more freedom of design? Probably could even drive the whole thing from a single static method.

Will add a note to the follow up issue to keep thinking about this.

if (children is [.., MarkupBlockSyntax block, RazorMetaCodeSyntax /* close brace */] &&
!context.CodeDocument.GetCSharpDocument().SourceMappings.Any(m => block.Span.Contains(m.OriginalSpan.AbsoluteIndex)))
// This doesn't cause any harm with the new engine, but its a waste of effort.
if (!_languageServerFeatureOptions.UseNewFormattingEngine)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: Should this move before the if above just to reduce some work?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure I follow exactly. There is no work to avoid here, it isn't an early return, and the above if produces things this needs.

}

[FormattingTestFact]
public async Task StartsWithWhitespace()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For completeness do we need an EndsWithWhitespace? I Guess it's covered here (and probably elsewhere)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm happy to add one, but my approach for creating new tests was simply pure reaction. Either there were quirks/bugs I found during development, or conditions I was adding that I wanted to validate. So if there is no EndsWithWhitespace, it means there was nothing odd about the tree when it came to formatting it :)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd say just add it for now. If nothing else serves as a regression test in case the tree changes. The tests are great, run quick, and provide a baseline comfort for changing both the syntax tree in the compiler and formatting behavior in tooling. I'd just err on more coverage even if it's not proven to be needed

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

run quick

They ran a lot quicker before I made them all run 8 times 😁

Copy link
Member

@DustinCampbell DustinCampbell left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking good!

private int _currentFirstNonWhitespacePosition;

// These are set in GetCSharpDocumentContents so will never be observably null
private StringBuilder _builder = null!;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just noting that this can be passed into the constructor and the field can be readonly now.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll be honest, I have yet to find any implementation of this class that I actually like fully. I did this change at one point, but it felt like if I was doing that, I may as well just pass the lineInfo array in too, and do all of the work in the constructor. Which also felt wrong :)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I think that'd be wrong. 😄 FWIW, I was only noting that _builder could be passed to the constructor. I don't think it's necessary since it's harder to pass the PooledArrayBuilder<LineInfo> there. However, if you wanted to pass both to the constructor, you could just grab an ImmutableArray<LineInfo>.Builder from ArrayBuilderPool and forego the PooledArrayBuidler` savings.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

True. I guess PooledArrayBuilder is only helping us with very short documents.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's right. Alnd if things become a problem using the default pool, formatting could create and own its own pool.

@davidwengier
Copy link
Contributor Author

Thanks for the reviews. Going to merge this, but there will be follow ups, so if I've misinterpreted anything or there is more to do, feel free to comment on #11371

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
3 participants