Long GC Suspend and hard faults #111201

arekpalinski · 2025-01-08T14:53:44Z

Description

We (RavenDB) have a customer that is occasionally experiencing very long GC pauses. It's not a new issue or a regression. It's experienced from long time. It was happening on .NET 6 already. Some time ago RavenDB got updated to use .NET 8 (in order to be on LTS version). More specifically version 8.0.400 is deployed on the affected environment.

The pauses affect the application requests to RavenDB. Also the monitoring shows connectivity issues between the nodes (the setup is 3 nodes RavenDB cluster).

So far we have narrowed it down to GC Suspend events taking a lot of time with the usage of PerfView.

Reproduction Steps

This is happening very occasionally, only in a production environment. RavenDB is configured as 3 nodes (machines) cluster. The issue is happening randomly on any node. The machines have 256 GB of memory, OS is Windows Server 2019 Datacenter (OS build number: 17763.5696.amd64fre.rs5_release.180914-1434). They are hosted in Azure.

Expected behavior

GC pauses aren't taking so long

Actual behavior

We'd like to share our analysis of the recent occurrence of the issue. We have three PerfView traces that were collected with the usage of the following command:

PerfView.exe /nogui /accepteula /BufferSizeMB:4096 /CircularMB:4096 /CollectMultiple:3 /StopOnGCSuspendOverMSec:5000 /KernelEvents=ThreadTime+Memory+VirtualAlloc /NoNGenRundown /Merge:true /Zip:true collect /Process:"Raven.Server" Over5SecGCSuspend

The longest GC suspend took 18,724.797 mSec (time range: 244,459.607 - 263,347.329):

The below analysis is about this GC suspend event (although other two PerfView outputs are very similar)

Events

Affected GC thread, doing the suspension is 2908

Thread Time (with Ready Threads) Stacks

Analysis of coreclr!ThreadSuspend::SuspendEE

There is a lot of READIED BY TID(0) Idle (0) CPU Wait < 1ms IdleCPUs events which they mostly point to ntoskrnl!??KiProcessExpiredTimerList (READIED_BY):
Below we can also find two RavenDB threads - 5108 and 6524:

Looking deeper we can find that both threads are performing some queries:

When looking at the stacks of 5108 thread we can see that it's reading a documents from disk (we use memory mapped files), causing a page fault:

This is about the following code:

https://github.com/ravendb/ravendb/blob/1c79a8d9131b248cfe129f7ad516495f31942584/src/Voron/Impl/LowLevelTransaction.cs#L648-L655

https://github.com/ravendb/ravendb/blob/1c79a8d9131b248cfe129f7ad516495f31942584/src/Voron/Impl/Paging/AbstractPager.cs#L332

Events (ReadyThread)

ReadyThread events for AwakenedThreadId - 2908 (GC thead doing the suspension), was awakened by mentioned Raven's thread - 5108 and 6524, but also by Idle (0) process so I assume it's System process.

CPU Stacks

From CPU Stacks we know that before long GC suspend, we have started FlushFileBuffers() on our Raven.voron file where documents are saved (threads 5108 and 6524 are reading from). We call it periodically explicitly here:

https://github.com/ravendb/ravendb/blob/1c79a8d9131b248cfe129f7ad516495f31942584/src/Voron/Platform/Win32/WindowsMemoryMapPager.cs#L344

It lasted also during the GC suspend. Under the covers we can see MmTrimSection() call, which as we understand will evict some pages, hence subsequent access to its trimmed pages will result in a page fault, requiring the system to re-read the data from the file (what we see in 5108 and 6524 threads).

Events (Suspend EE and Hard Faults)

Going back to Events view and adding Hard Faults events:

So from our analysis it looks that the GC suspend was caused by Hard Faults taking 18-19 seconds (not sure why) after recent FlushFileBuffers().

Questions

Can you please validate our analysis?
Can you please confirm our assumption about MmTrimSection()?
Is it expected / known issue that a managed thread doing a hard page fault might be a blocker for GC suspend?
Is there any kind of a OS / kernel lock that that is affecting a thread doing a read which can result in 18-19 seconds hard faults?

Regression?

No response

Known Workarounds

No response

Configuration

No response

Other information

No response

The text was updated successfully, but these errors were encountered:

dotnet-policy-service · 2025-01-08T15:15:35Z

Tagging subscribers to this area: @dotnet/gc
See info in area-owners.md if you want to be subscribed.

dotnet-policy-service · 2025-01-08T19:04:01Z

Tagging subscribers to this area: @mangod9
See info in area-owners.md if you want to be subscribed.

Maoni0 · 2025-01-08T19:04:21Z

looping in @kouvel for suspension issue. thanks for the detailed investigation!

markples · 2025-01-08T19:31:09Z

just a few observations - I don't know details about the flush/trim part, but yes, if a normal memory operation faults, then that thread won't have been put into a "GC ready" state (as opposed to the flush itself, which the DllImport mechanism can do). (glossing over details about fully interruptible code vs not)

Presumably the file size (seeing the file offset of 815,000,000,000) is part of it.

arekpalinski · 2025-01-09T13:19:30Z

Yes, the file in question is pretty large - 862GB. Here are details of a sample sync operation on that file (FlushViewOfFile + FlushFileBuffers). Note it's from a different time so it's not about GC suspension in question:

According to our own metrics we had 49.5MB of data written that were not synced. It's literally the measurement of the following piece of code:
https://github.com/ravendb/ravendb/blob/1c79a8d9131b248cfe129f7ad516495f31942584/src/Voron/Platform/Win32/WindowsMemoryMapPager.cs#L327-L349

arekpalinski · 2025-01-09T13:43:36Z

We have also another PerfView traces, collected about ~15 minutes earlier where we also had GC suspension taking 6.8 second. Also during that time we can see very long hard page faults (but meanwhile there are some, even with similar offsets but taking below 1ms. I know it doesn't mean anything but it's interesting):

Dropping HardFault events, and showing more FlushInit which happened much earlier (since the durations of hard faults were about 20 seconds, so I assume that earlier flushes had to cause them) shows us:

arekpalinski · 2025-01-09T13:52:51Z

Something that I have problems with understanding is that we see a lot of FlushInit events, but almost no FlushBuffers. Although Any Stacks view shown for a any FlushInit event, points to out FlushFileBuffers.

Can it be that, as show in one of the screenshots from yesterday, that most of our FlushFileBuffers is actually MmTrimSection?

dotnet-issue-labeler bot added the needs-area-label An area label is needed to ensure this gets routed to the appropriate area owners label Jan 8, 2025

dotnet-policy-service bot added the untriaged New issue has not been triaged by the area owner label Jan 8, 2025

huoyaoyuan added area-GC-coreclr and removed needs-area-label An area label is needed to ensure this gets routed to the appropriate area owners labels Jan 8, 2025

Maoni0 added area-VM-coreclr and removed area-GC-coreclr untriaged New issue has not been triaged by the area owner labels Jan 8, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Long GC Suspend and hard faults #111201

Long GC Suspend and hard faults #111201

arekpalinski commented Jan 8, 2025

dotnet-policy-service bot commented Jan 8, 2025

dotnet-policy-service bot commented Jan 8, 2025

Maoni0 commented Jan 8, 2025

markples commented Jan 8, 2025

arekpalinski commented Jan 9, 2025

arekpalinski commented Jan 9, 2025

arekpalinski commented Jan 9, 2025

Long GC Suspend and hard faults #111201

Long GC Suspend and hard faults #111201

Comments

arekpalinski commented Jan 8, 2025

Description

Reproduction Steps

Expected behavior

Actual behavior

Events

Thread Time (with Ready Threads) Stacks

Events (ReadyThread)

CPU Stacks

Events (Suspend EE and Hard Faults)

Questions

Regression?

Known Workarounds

Configuration

Other information

dotnet-policy-service bot commented Jan 8, 2025

dotnet-policy-service bot commented Jan 8, 2025

Maoni0 commented Jan 8, 2025

markples commented Jan 8, 2025

arekpalinski commented Jan 9, 2025

arekpalinski commented Jan 9, 2025

arekpalinski commented Jan 9, 2025