Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make GitFileSystemObjectSink multi-threaded #12087

Draft
wants to merge 4 commits into
base: master
Choose a base branch
from

Conversation

edolstra
Copy link
Member

@edolstra edolstra commented Dec 19, 2024

Motivation

This speeds up importing tarballs into the Git cache.

Time for importing a nixpkgs tarballs on Nix master:

# rm -rf ~/.cache/nix/tarball-cache/; command time nix flake metadata tarball+file:///.../master.tar.gz
20.00user 2.41system 0:12.13elapsed 184%CPU (0avgtext+0avgdata 480952maxresident)k
439396inputs+462197outputs (13910major+163516minor)pagefaults 0swaps

With this PR:

4.59user 9.37system 0:03.68elapsed 379%CPU (0avgtext+0avgdata 229920maxresident)k
499429inputs+473245outputs (0major+82815minor)pagefaults 0swaps

TODO: this currently buffers all file contents in memory before writing them to disk, so it's no longer constant memory. But this could be fixed pretty easily (e.g. if there are more than N bytes worth of unwritten files, then wait).

Context


Add 👍 to pull requests you find important.

The Nix maintainer team uses a GitHub project board to schedule and track reviews.

@github-actions github-actions bot added with-tests Issues related to testing. PRs with tests have some priority fetching Networking with the outside (non-Nix) world, input locking labels Dec 19, 2024
@edolstra edolstra force-pushed the multithreaded-git-sink branch from 4e96f92 to 4bf9371 Compare January 15, 2025 18:55
{
initLibGit2();

initRepoAtomically(path, bare);
if (git_repository_open(Setter(repo), path.string().c_str()))
throw Error("opening Git repository %s: %s", path, git_error_last()->message);

#if 0
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This would explain the 3.9× system time used, as many individual blob files will be created.

Why did you disable this?
Could this be re-enabled, considering that you've changed it to multiple GitRepo instances in the next commit?

@roberth
Copy link
Member

roberth commented Jan 24, 2025

Removing the packbuilder makes this a bit apples to oranges. Some users don't have great I/O on their systems and will be severely impacted by its removal, so I assume the removal was temporary.

The packbuilder is already configured to use multiple threads.
Possible bottlenecks and mitigations could be:

  • packfile compression. It's already multi-thread, but it could use a faster strategy
  • filling the packbuilder - synchronicity between producer and consumer. Use distinct threads and a queue

A bounded queue may also help with memory pressure, at least if the individual files aren't too big.

I would try to avoid using multiple packbuilders, because that means creating k times more packfiles.
Maybe that's still ok, but requires that we reindex a bit sooner (#11444).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
fetching Networking with the outside (non-Nix) world, input locking with-tests Issues related to testing. PRs with tests have some priority
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants