Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

a new GC API for large array allocation #27146

Open
Maoni0 opened this issue Aug 15, 2018 · 112 comments
Open

a new GC API for large array allocation #27146

Maoni0 opened this issue Aug 15, 2018 · 112 comments
Assignees
Labels
api-approved API was approved in API review, it can be implemented area-System.Runtime
Milestone

Comments

@Maoni0
Copy link
Member

Maoni0 commented Aug 15, 2018

To give users with high perf scenarios more flexibility for array allocations I propose to add a new API in the GC class.

Rationale

Below are mechanisms we would like to support for high perf scenarios

  • coreclr dotnet/runtime#20704;
  • choose whether you want to allocate the object as a gen0 object or in the old generation;
  • choose whether you want to pin the object you are requesting to allocate;

I am also thinking of exposing the large object size threshold as a config to users and this API along with that config should help a lot with solving the LOH perf issues folks have been seen.

Proposed APIs

class GC
{
    // generation: -1 means to let GC decide (equivalent to regular new T[])
    // 0 means to allocate in gen0
    // GC.MaxGeneration means to allocate in the oldest generation
    //
    // pinned: true means you want to pin this object that you are allocating
    // otherwise it's not pinned.
    //
    // alignment: only supported if pinned is true.
    // -1 means you don't care which means it'll use the default alignment.
    // otherwise specify a power of 2 value that's >= pointer size
    // the beginning of the payload of the object (&array[0]) will be aligned with this alignment.
    static T[] AllocateArray<T>(int length, int generation=-1, bool pinned=false, int alignment=-1)
    {
        // calls the new AllocateNewArray fcall.
        return AllocateNewArray(typeof(T).TypeHandle.Value, length, generation, pinned, clearMemory: true);
    }

    // Skips zero-initialization of the array if possible. If T contains object references, 
    // the array is always zero-initialized.
    static T[] AllocateUninitializedArray<T>(int length, int generation=-1, bool pinned=false, int alignment=-1)
    {
        return AllocateNewArray(typeof(T).TypeHandle.Value, length, generation, pinned, clearMemory: false);
     }
}

Restrictions

Only array allocations are supported via this API

Note that I am returing a T[] because this only supports allocating large arrays. it's difficult to support allocating a non array object since you'd need to pass in args for constructors and it's rare for a non array object to be large anyway. I have seen large strings but these are unlikely used in high perf scenarios. and strings also have multiple constructors...we can revisit if string is proven to be necessary.

Minimal size supported

Even though the size is no longer restricted to >= LOH threshold, I might still have some sort of size limit so it doesn't get too small. I will update this when I have that exact size figured out.

Perf consideration

Cost of getting the type

The cost of "typeof(T).TypeHandle.Value" should be dwarfed by the allocation cost of a large object; however in the case of allocating a large object without clearing memory, the cost may show up (we need to do some profiling). If that's proven to be a problem we can implement coreclr dotnet/corefx#5329 to speed it up.

Pinning

We'll provide a pinned heap that are only for objects pinned via this API. So this is for scenarios where you

  • have control over the allocation of the object you want to pin and
  • want to pin it as long as it's alive

Since we will not be compacting this heap fragmentation may be a problem so as with normal pinning, it should be use it with caution.

I would like to limit T for the pinning case to contain no references. But I am open to discussion on whether it's warranted to allow types with references.

@danmoseley
Copy link
Member

Edited proposal to match naming guidelines

@jkotas supportive of this going to api review?

@jkotas
Copy link
Member

jkotas commented Aug 15, 2018

Nit: The method should be static.

@jkotas supportive of this going to api review?

Yes.

@jkotas
Copy link
Member

jkotas commented Aug 15, 2018

return AllocateNewArray(typeof(T).TypeHandle.Value, length, generation, clearMemory);

I think this should rather be return AllocateNewArray(typeof(T[]), length, generation, clearMemory) ... but that's an implementation detail we can figure out later.

@4creators
Copy link
Contributor

IMO it would be good place to add alignment control for GC allocations. Additional parameter or additional overload would serve purpose very well.

class GC
{
    // generation: -1 means to let GC decide
    // 0 means to allocate in gen0
    // GC.MaxGeneration means to allocate in the oldest generation
    T[] AllocateLargeArray<T>(int length, int generation=-1, int alignment = -1, bool clearMemory=true)
    {
        // calls the new AllocateNewArray fcall.
        return AllocateNewArray(typeof(T).TypeHandle.Value, length, generation, clearMemory);
    }
}

where alignment value -1 means GC decides and any value > 0 asks for allocation alignment as specified by caller.

See dotnet/csharplang#1799 [Performance] Proposal - aligned new and stackalloc with alignas(x) for arrays of primitive types and less primitive as well

@jkotas
Copy link
Member

jkotas commented Aug 20, 2018

alignment control for GC allocations

This problem has been discussed in https://github.com/dotnet/corefx/issues/22790 and related issues.

@terrajobst
Copy link
Contributor

terrajobst commented Aug 28, 2018

Video

  • Should it just be AllocateArray? In the end, the developer controls the size.
  • What happens if the developer specifies gen-0 but wants to create a 50 MB array? Will the API fail or will it silently promote the object to, say, gen-1?
  • No clearing the memory is fine, but we want to make sure it shows up visibly on the call side (a plain false isn't good enough). We'd like this to be an overload, such a AllocateLargeArrayUninitialized? The other benefit of having an overload is that this could be constrained to only allow Ts with no references (unmanaged constraint).
  • Is LOH the same as MaxGeneration? If not, how can a developer explicitly allocate on the LOH?

@jkotas
Copy link
Member

jkotas commented Aug 28, 2018

We'd like to this be an overload, such a AllocateLargeArrayUninitialize

Agree. Did you mean AllocateUninitializedArray ?

The other benefit of having an overload is that this could be constrained to only allow Ts

I do not think we want the unmanaged constrain it. It would just make this API more pain to use in generic code for no good reason. GC should zero-initialize the array in this case. Note that the array will be zero-initialize in many cases anyway when the GC does not have a uninitialized block of memory around. The flag is just a hint to the GC that you do not care about the content of the array.

@Maoni0
Copy link
Member Author

Maoni0 commented Aug 29, 2018

Should it just be AllocateArray? In the end, the developer controls the size.

this is only meant for large array allocation, ie, arrays larger than the LOH threshold.

What happens if the developer specifics gen-0 but wants to create a 50 MB array? Will the API fail or will be silently promote it to, say, gen-1?

that's something we need to decide. but if it fails to allocate anything in gen0 it would revert to the default behavior (ie, on LOH).

No clearing the memory is fine, but we want to make sure it shows up visibly on the call side

I am not sure why this needs to be an overload but not the other aspects. why wouldn't there be a AllocateLargeArrayInYoungGen overload too, then?

Is LOH the same as MaxGeneration? If not, how can a developer explicitly allocated on the LOH?

LOH is logically part of MaxGeneration.

I do not think we want the unmanaged constrain it. It would just make this API more pain to use in generic code for no good reason.

this API is not for generic code though. I would only expect people with very clear intentions for perf to use this. and if you specify to not clear, I think, if I were a user, it would be more desirable to indicate an error if that can't be done (ie, the type has references) instead of silently taking much longer.

after the discussion it seems like this API should perhaps take another parameter that indicates whether the operation succeeded or not, eg, AllocateLargeArrayError.TooBigForGen0, AllocateLargeArrayError.MustClearTypesContainsReferences. however I will leave this decision to API folks.

@jkotas
Copy link
Member

jkotas commented Aug 29, 2018

this API is not for generic code though

What makes you think that it is not? It is very natural to use these API to implement generic collections optimized for large number of elements.

@jkotas
Copy link
Member

jkotas commented Aug 29, 2018

I am not sure why this needs to be an overload but not the other aspects. why wouldn't there be a AllocateLargeArrayInYoungGen overload too, then?

The uninitialized memory has security ramifications so you want to have an easy way to search for it. Generation hint has no security ramifications.

@Maoni0
Copy link
Member Author

Maoni0 commented Aug 29, 2018

What makes you think that it is not? It is very natural to use these API to implement generic collections optimized for large number of elements.

do you think the default is not good for "implementing generic collections with large number of elements" in general? I would think it is - you'd want the objects to be cleared so you don't deal with garbage; and most of the time if you have an object with large # of elements it should be on LOH, not gen0.

The uninitialized memory has security ramifications

ahh, yep, makes sense to single out APIs with security ramifications.

@jkotas
Copy link
Member

jkotas commented Aug 29, 2018

The default is fine for most cases. This API is workaround for cases where the default does not work well and turns into bottleneck.

Large arrays are used mostly for buffers and collections. I think it is important that this API works well for specialized generic collections.

For example, the email thread from a few months ago that both of us are on had this real-world code fragment:

class ListEx<T> : IList<T>
{
    private T[][] Memory = null;

    public T this[int index]
    {
        get
        {
            removed checking
            return Memory[index / blockSize][index % blockSize];
        }

This code artificially allocates number of smaller arrays to simulate large array. It does it to avoid landing short-lived large array in Gen2 heap. The double indirection has non-trivial cost (the element access is several times slower). Changing the implementation of this ListEx<T> to use these APIs and avoiding the double indirection should be straightforward. Also, when T does not contain object references, it is fine for the backing array to be uninitialized.

@Maoni0
Copy link
Member Author

Maoni0 commented Aug 29, 2018

😆 I see what the confusion was...by "generic" I meant "general cases" and you meant "code that implements generics collections". what I meant was this is not an API used in general cases so it's a little harder to use I don't see that as a problem.

@tannergooding
Copy link
Member

this is only meant for large array allocation, ie, arrays larger than the LOH threshold

@Maoni0, what would be the proposed behavior if the user attempts to create an array smaller than the LOH threshold?

@jkotas
Copy link
Member

jkotas commented Aug 29, 2018

what I meant was this is not an API used in general cases so it's a little harder to use I don't see that as a problem.

Little harder to use is fine. Unmanaged constrain would make it very hard to use in my ListEx example (you would have to use reflection to call the API). It is why I think the unmanaged constrain is not good for this API.

@benaadams
Copy link
Member

what would be the proposed behavior if the user attempts to create an array smaller than the LOH threshold?

e.g.

_longLivedArray = AllocateLargeArray<Vector4>(length: 8000, generation: 2);

Where the goal is more to allocate straight to final generation

@msedi
Copy link

msedi commented Aug 29, 2018

I like the suggestion but I'm curious about two things and some thoughts

  1. I thought the LOH has no generation, objects do not get promoted or compacted?
    I was under the impression that LOH objects are only compacted when I set the
GCSettings.GCLargeObjectHeapCompactionMode = LargeObjectHeapCompactionMode.CompactOnce;
GC.Collect();

I must admit that I truly dislike this. In my opinion there should either be a deterministic method, e. g. GC.CompactLOH() either blocking or nonblocking or there should be a setting how to handle the LOH in terms of GCing (so for my applications I would prefer more the approach that if a LOH is necessary because I'm running out of memory that an LOH compaction takes place with manual interaction instead of just getting an OutOfMemoryException). So an LOHCompactingBehavior would be nice.. Most of our objects are larger than 85k, more in the direction of 512x512x4. So if I don't implement my own mechanisms to call GC.Collect() and do a CompactOnce the memory gets more and more fragmented event if I have enough memory, right?

  1. Another point is that currently the 85k is somehow an implementation detail not everyone is aware of.
    So I personally prefer the suggestion from @terrajobst with the AllocateArray, but I would rather place it where the developer expects it to be, namely in the Array. There is already a CreateInstance although not generic, but what would prevent you from putting it there?

In the end I might also be interested in not initializing the array even if it's not a LOH array. We are having this a lot when loading data. I need to allocate a byte array first, which is immediately initialized to 0, but in the end I only need a container to override it again.

@jkotas
Copy link
Member

jkotas commented Aug 29, 2018

I would rather place it where the developer expects it to be, namely in the Array

Array is main stream type. These are specialized methods for micro-managing the GC that we expect to be used rarely. We avoid placing specialized methods like this on the main stream types. For example, GC.GetGeneration could have been on object, but it is not for the same reason.

@jkotas
Copy link
Member

jkotas commented Aug 29, 2018

Updated the proposal at the top with feedback incorporated. @Maoni0 Thoughts?

@msedi
Copy link

msedi commented Aug 29, 2018

@jkotas: Somehow you are right. But from a certain point of view as a user I don't want to search through the API to find specialized things. I think it is not so seldom that people allocate more than 85k right? 85k is not such a big number so I guess there are many people out there using larger array without even knowing there is a difference as the things from the GC are not so documented in detail than other "classes".

It would be interesting to see how many people know about these internals. Do you have a number on this?

To be honest, I'm fully OK if it's placed in the GC ;-) But I'm a fan of putting the things together where they belong. Something like the GC and the GCSettings seems to me an artifical separation.

@jkotas
Copy link
Member

jkotas commented Aug 29, 2018

I think it is not so seldom that people allocate more than 85k right?

Right. We believe that the right default for >85k arrays is to put them into Gen2. We do not expect a lot of .NET developers to worry about these internals. If they need to worry, we have failed.

The path how folks discover these APIs is that they will find they got GC performance issue in their app, they will find the root cause and get to on documentation page that has suggestions for solving different GC performance issues. This API can be one of the suggestions, another suggestion can be array pooling.

@Maoni0
Copy link
Member Author

Maoni0 commented Aug 29, 2018

@jkotas I think I misunderstood what you meant by "unmanaged constraints". you meant you don't want the users to have to figure out whether a type contains ref or not (and then call the API only if it doesn't contain refs). I do agree that would be a good thing. a (nit) comment I have on the new AllocateUninitializedArray API is the name sounds like it will for sure be uninitialized but in reality it will be initialized if it contains references and that (important part) isn't reflected in the name. but AllocateUninitializedArrayWhenAppropriate is probably too long.

I'd like to keep this API for only allocating large objects only because I am not implementing a new new. our implementation for allocating a new object with new is heavily optimized and I am not duplicating all that logic. but that's ok for allocating a large object 'cause they are already expensive to allocate. of course there's a balance between the GC cost that this might save and the allocation perf. my worry for opening up this API for smaller objects is people may allocate way more in the old gen and end up reducing the total perf (ie, allocation is more expensive and more cost in GC).

@jkotas
Copy link
Member

jkotas commented Aug 29, 2018

unmanaged constraints

The unmanaged constrain is a new C# language feature: https://github.com/dotnet/csharplang/blob/master/proposals/csharp-7.3/blittable.md

in reality it will be initialized if it contains references

In reality, it will be also initialized if the GC does not have a suitable block of memory to reuse. Naming is hard - I agree that AllocateUninitializedArrayWhenAppropriate feels too long.

I'd like to keep this API for only allocating large objects

Do you mean to enforce this (e.g. fail with exception when the size is less than X - what should X be?), or just provide guidance and log this in GC trace (I think we should have uses of these APIs in the GC trace anyway)? I think it should be just guidance and logging.

@Maoni0
Copy link
Member Author

Maoni0 commented Aug 29, 2018

In reality, it will be also initialized if the GC does not have a suitable block of memory to reuse.

whether GC happens to have a suitable block of memory to use is completely unpredictable. the point is if it contains references, GC will make the guarantee that it's initialized; whereas if it doesn't contain references, GC will not make such a guarantee at all if you call this API.

Do you mean to enforce this (e.g. fail with exception when the size is less than X - what should X be?)

X is the LOH threshold which can be exposed as something the user can get. I don't have a very strong opinion whether to enforce this or not. I can see pros and cons for both. I lean towards enforcing but I can understand that users probably want the other way.

@jkotas
Copy link
Member

jkotas commented Aug 29, 2018

I have seen cases where folks allocate several arrays (not necessarily above LOH threshold) and pin them for a very long time. The GC has to step around the pinned arrays that causes perf issues if they are stuck in a bad place. This would be another case where this API may help and it is a reason for not enforcing the LOH threshold.

@Maoni0
Copy link
Member Author

Maoni0 commented Aug 29, 2018

yep, that's certainly a good scenario - obviously it would require you to know the objects that will be pinned before hand; a common situation with pinning is you allocate objects first, then decide to pin them some time later at which point the generation is already decided. but yes, if you do know at alloc time that would make a legit case to use this API.

discussions like this (ie, the kinds of scenarios you'd like use this API for) are definitely welcome!

@saucecontrol
Copy link
Member

saucecontrol commented Aug 29, 2018

The example @benaadams used is a good one. I make plenty of allocations under the LOH limit that I know in advance are going to be long-lived (and/or pinned at some point).

For that matter, it might be advantageous to have ArrayPool<T>/MemoryPool<T> allocate straight to gen2, even for the smallest arrays, since they're likely to live long enough to be promoted anyway.

@xoofx
Copy link
Member

xoofx commented Sep 13, 2018

Love this proposal! Knowing in advance the lifetime and being able to allocate from the start where it is more efficient. In many occasions when allocating array of structs that I knew should have to stay for the duration of an application , I had to allocate at least 85Ko to make sure that it was going to the LOH... being able to allocate smaller array directly to gen2 would be great.

Extra question: Would we have a way to pin this allocation after, knowing that it is on gen2 and that it would not move anymore for example? (usage: sharing caches between managed array and native code)

@Maoni0
Copy link
Member Author

Maoni0 commented Sep 13, 2018

@xoofx being in gen2 doesn't mean it would not move anymore. and you can pin the object you get back just like you can pin any other object.

@jkotas
Copy link
Member

jkotas commented Jul 31, 2020

I see that for ref types, but not value types

Value types can contain reference type fields. For example, struct { int id; string name; } needs zero initialization because of the name field. In theory, you can avoid zero-initializating id field in this case, but it is much easier and faster to zero initialize everything in cases like this one. SkipInitLocals works the same way.

@abelbraaksma
Copy link
Contributor

abelbraaksma commented Jul 31, 2020

Thanks, I should've been more specific and say "unmanaged" types. I meant a value type for which no members are reference types at any depth.

This behavior is how I would expect it, sounds perfect!

@danmoseley
Copy link
Member

Moving to Future as this is not required for 5.0 as far as i can see.

@antonfirsov
Copy link
Member

antonfirsov commented Nov 29, 2020

@jkotas @Maoni0 considering that .NET 6 will be LTS, any chance the alignment feature can get some priority to make it into that release?

It's quite useful for library developers working with SIMD. I'm commenting wearing ImageSharp 🎩, but could be handy for ML.NET folks, and a wide range of other libs in the ecosystem.

@adamsitnik
Copy link
Member

The alignment support would help us a lot with #27408 (which blocks #24847, which we would love to have to complete our "super fast disk IO" story in .NET 6).

@Maoni0 @VSadov how much time would it take to implement it by a person that is not familiar with the GC code base like me?

@Maoni0
Copy link
Member Author

Maoni0 commented Apr 29, 2021

@adamsitnik can you be a bit more specific? this is alignment in general or only for pinned objects?

@adamsitnik
Copy link
Member

this is alignment in general or only for pinned objects?

@Maoni0 I would assume that we would implement it only for the pinned objects. Otherwise, we would need to somehow store the alignment size (and increase the size of every managed object?) and respect it when the objects are moved?

@aromaa
Copy link
Contributor

aromaa commented Jun 12, 2021

As the #48117 got merged and enabled the definition "ALLOW_REFERENCES_IN_POH" in CoreCLR, would it make sense to lift the restrictions on disallowing references in the POH from the AllocateArray API? Based on the discussion above, the limitation was there to simplify the GC implementation.

For my use case especially, as you can't directly allocate single objects in the POH, I could instead allocate "pinned pointer". Normally, if you want to give unmanaged code a pointer to the object, you need to pin it first, but this comes with the drawback of fragmenting the heap for long lived objects.

To workaround this, I was thinking of allocating a reference array in the POH and assigning the objects that needs to be pinned there and feeding the unmanaged code the pointer to the array offset. This way the objects can be freely moved in the managed heap but the array in the POH won't lose the reference to the actual object. Now to access the object, you can use indirection by first accessing the array to find the managed pointer and then getting the object.

And I know what you are thinking now, GC hole. But to avoid this, I'm just gonna call the managed code which then retrieves the object and manipulates it, thus avoiding the whole GC hole problem. The unmanaged code only needs the pointer to keep track of objects. I'm not worried about the overhead of the GC transition from unmanaged to managed, as I need to do other work in the managed code anyway.

Its not the most ideal solution, but it prevents the fragmentation of the heap.

Another solution, because it would be difficulty to introduce an API that allocates directly in the POH, would it make sense to think about API that can force objects to be moved to the POH? This would of course be a bit "clumsy" as the object would first be allocated in the gen 0 and then immediately moved to POH, but JIT could recognize this pattern and optimize for it (or not). Also it wouldn't be the most pretty looking code, but pinning happens so rarely anyway.

@AraHaan
Copy link
Member

AraHaan commented Jun 12, 2021

Or another option: Allocate all memory on the unmanaged side and if you want to "extend" a pointer, pass that pointer on to a special function on the managed side to the unmanaged side and it take care of the rest, then manipulate that data (write to it) like normally.

@aromaa
Copy link
Contributor

aromaa commented Jun 12, 2021

That would force the used data structures to be blittable and in this case that wouldn't be possible. The data isn't needed on the unmanaged side, its for the managed code, but the unmanaged code has to point where to get that data.

At the moment I'm just passing id (int) to unmanaged code and then use that to retrieve the managed object from a dictionary (could also be an array for better performance). This obviously works just fine but I would rather deal with raw pointers for simplicity and performance.

This is just one use case I have but there are certainly way more valid usage cases for references in the POH. Also as the POH already seems to support references, it would seem to make sense to allow that (?), unless there's some blocking issue.

@webczat
Copy link
Contributor

webczat commented Jun 12, 2021

Isn't it the use case for a normal gchandle instead of special convention to pass some index/pointer?

@aromaa
Copy link
Contributor

aromaa commented Jun 12, 2021

The problem with GCHandles is that it fragments the heap and has some overhead due to the internal table needed to keep those objects as roots. The POH solves this issue by having separate heap for those objects. So if you have hundreds of long lived objects that are kept pinned ultimately results to fragmented heap and increased memory usage, which isn't ideal.

@webczat
Copy link
Contributor

webczat commented Jun 12, 2021

there is probably some misconception here. as in I read you like you don't really need objects themselves pinned, but you need to pass them to unmanaged code so that they can then be passed back to managed code and used. GCHandle solves this, and does not require pinning. I don't mean using it for pinning.

@aromaa
Copy link
Contributor

aromaa commented Jun 12, 2021

Ah, right! You were speaking about normal GCHandle. That could also work for this, but as far as I remember, having a lot of GCHandles will causes problems by having long GC pauses. I could confirm this and see what happens.

@jkotas
Copy link
Member

jkotas commented Jun 12, 2021

having a lot of GCHandles will causes problems by having long GC pauses

The problem are GCHandles that point to Gen0 objects. They are scanned during stop-the-world pause in the current GC. It is no different from POH array slots pointing to Gen0 objects. They are scanned during stop-the-world pause as well. The exact algorithm is different between the two so the absolute numbers will vary and also depend on the usage pattern. But it is safe to say that having a lot of either one will contibute to GC pause times.

@aromaa
Copy link
Contributor

aromaa commented Jun 12, 2021

The problem are GCHandles that point to Gen0 objects

Why would only gen 0 objects be a problem? What about gen 1/2 objects? I couldn't find any differences between generations when doing some testing.

The exact algorithm is different between the two so the absolute numbers will vary and also depend on the usage pattern.

I was expecting the POH to be more lightweight and optimized for this and get the best performance.

Benchmarks

I'm not really sure what would be the most ideal way to benchmark the differences between these two and I'm unsure does my benchmark reflect anything relevant in the real world, but here are my findings. The N is the number of objects that were allocated and the benchmark code only does one call to GC.Collect() and the objects are created beforehand in the iteration setup.

On the first test I created an object[] array in the pinned heap and then assigned objects to it.

Method N Mean Error StdDev Median
DoGcPOH 1000 129.6 us 11.78 us 33.61 us 116.8 us
DoGcPOH 10000 284.3 us 20.00 us 57.39 us 279.9 us
DoGcPOH 100000 1,667.7 us 127.67 us 368.37 us 1,535.8 us

On the second test I had a GCHandle[] array and then assigned the created GCHandles to it (so I can clean them up afterwards).

Method N Mean Error StdDev Median
DoGcHandles 1000 230.8 us 14.71 us 41.50 us 233.1 us
DoGcHandles 10000 376.2 us 32.82 us 92.03 us 336.5 us
DoGcHandles 100000 2,691.3 us 167.84 us 489.59 us 2,493.8 us

So from the looks of it, the POH is scanned much faster than the GCHandles, as expected. These objects were in the generation 0 upon calling the GC.Collect, but promoting them beforehand does not reflect any differences in the numbers.

Benchmark code

The benchmarks are split to two separate classes for simplicity.

The POH benchmark:

using System;
using System.Collections.Generic;
using System.Linq;
using System.Reflection;
using System.Runtime.CompilerServices;
using BenchmarkDotNet.Attributes;
using BenchmarkDotNet.Diagnostics.Windows.Configs;

//[MemoryDiagnoser]
//[EtwProfiler(performExtraBenchmarksRun: false)]
//[SimpleJob(targetCount: 1000)]
public class PohBenchmarks
{
	[Params(1_000, 10_000, 100_000)]
	public int N;

	private object[] handles;

	[GlobalSetup]
	public void GlobalSetup()
	{
		this.handles = GCHelpers.AllocateArray<object>(this.N);
	}

	[IterationSetup]
	public void IterationSetup()
	{
		GC.Collect();

		for (int i = 0; i < this.N; i++)
		{
			this.handles[i] = new();
		}
	}

	[Benchmark]
	public void DoGcPOH()
	{
		GC.Collect();
	}

	[IterationCleanup]
	public void IterationCleanup()
	{
		this.handles.AsSpan().Clear();
	}
}

internal static class GCHelpers
{
	private enum GC_ALLOC_FLAGS
	{
		GC_ALLOC_NO_FLAGS = 0,
		GC_ALLOC_ZEROING_OPTIONAL = 16,
		GC_ALLOC_PINNED_OBJECT_HEAP = 64,
	};
	
	private static readonly Func<IntPtr, int, GC_ALLOC_FLAGS, Array> AllocateNewArray = typeof(GC).GetMethod("AllocateNewArray", BindingFlags.Static | BindingFlags.NonPublic).CreateDelegate<Func<IntPtr, int, GC_ALLOC_FLAGS, Array>>();

	internal static T[] AllocateArray<T>(int length)
	{
		return Unsafe.As<T[]>(GCHelpers.AllocateNewArray(typeof(T[]).TypeHandle.Value, length, GC_ALLOC_FLAGS.GC_ALLOC_PINNED_OBJECT_HEAP));
	}
}

The GCHandle benchmark:

using System;
using System.Runtime.InteropServices;
using System.Threading;
using BenchmarkDotNet.Attributes;
using BenchmarkDotNet.Diagnostics.Windows.Configs;

//[MemoryDiagnoser]
//[EtwProfiler(performExtraBenchmarksRun: false)]
//[SimpleJob(targetCount: 1000)]
public class GcHandleBenchmarks
{
	[Params(1_000, 10_000, 100_000)]
	public int N;

	private GCHandle[] handles;

	[GlobalSetup]
	public void GlobalSetup()
	{
		this.handles = new GCHandle[this.N];
	}

	[IterationSetup]
	public void IterationSetup()
	{
		GC.Collect();

		for (int i = 0; i < this.N; i++)
		{
			object instance = new();

			this.handles[i] = GCHandle.Alloc(instance, GCHandleType.Normal);
		}
	}

	[Benchmark]
	public void DoGcHandles()
	{
		GC.Collect();
	}
	
	[IterationCleanup]
	public void IterationCleanup()
	{
		int n = this.N;
		for (int i = 0; i < n; i++)
		{
			this.handles[i].Free();
			this.handles[i] = default;
		}
	}
}

@jkotas
Copy link
Member

jkotas commented Jun 12, 2021

Why would only gen 0 objects be a problem? What about gen 1/2 objects?

Gen2 -> Gen2 references are scanned in the background. The GC pause times are not affected by how many of them you have as long as the background GC works as expected).

Gen2 -> Gen0 references are scanned during the stop-to-world pauses. The GC pause times are always affected by how many of them you have.

POH is scanned much faster than the GCHandles,

The GC scanning of arrays is optimized for scanning of the whole array and it works well for the usage pattern in your micro-benchmark. It is what I meant "the exact algorithm is different between the two so the absolute numbers will vary and also depend on the usage pattern".

@aromaa
Copy link
Contributor

aromaa commented Jun 12, 2021

Gen2 -> Gen2 references are scanned in the background. The GC pause times are not affected by how many of them you have as long as the background GC works as expected).

And I'm gonna assume that the GC.Collect() forces the gen2 to be blocking even with the concurrent GC and thats why I never saw any differences? I did try the blocking: false parameter and got the same results, but the documentation mentions that its not guaranteed so 🤷.

Okay, this clears up my confusion as I had the impression that GCHandles always effected the pause time by forcing the rooted objects to be scanned during the pause (even if they were in gen2). Looking forward to moving my stuff to use GCHandles.

Well, back to the topic. The question still remains, would it make sense to now allow references in the POH as its supported by the GC?

@Maoni0
Copy link
Member Author

Maoni0 commented Jun 15, 2021

the only reason why we allowed references is because of a usage in the runtime, not because there was a need for general usage. and it sounds like you are fine with using normal handles?

@aromaa
Copy link
Contributor

aromaa commented Jun 15, 2021

I can understand that, and I was already a bit skeptical about my use case. And looks like I forgot some details about my use case while doing the refactor while going through the implementation.

I forgot that I was actually pinning some data already in the Gen 0 which I moved to POH. What I have now is a blittable struct with normal GCHandle and some other variables needed on the unmanaged side. The variables are directly accessed by pointers on the unmanaged side and when the managed side is needed, the GCHandle is passed on.

Now, I could replace the GCHandle with managed reference to avoid the indirection needed to do the GC table lookup.

@deeprobin
Copy link
Contributor

Would certainly be an interesting API for me.
Do we need major GC changes here or just C# changes and possibly minor GC changes?

@xoofx
Copy link
Member

xoofx commented Jul 23, 2024

Hey, my apology to revive this topic, but I have actually a question related to the existing API GC.AllocateArray<T> as the doc mentions:

If pinned is set to true, T must not be a reference type or a type that contains object references.

But I have tried with a reference type and it seems to work well (and quickly checking the C++ code behind I don't see a check), so can I assume that this is allowed to pin an array of reference types?

@jkotas
Copy link
Member

jkotas commented Jul 23, 2024

can I assume that this is allowed to pin an array of reference types?

Yes, this was relaxed in #89293. Submitted dotnet/dotnet-api-docs#10142 to update the docs.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
api-approved API was approved in API review, it can be implemented area-System.Runtime
Projects
None yet
Development

No branches or pull requests