Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

The way I'm using cacache is very slow #71

Open
bryanlarsen opened this issue May 5, 2024 · 6 comments
Open

The way I'm using cacache is very slow #71

bryanlarsen opened this issue May 5, 2024 · 6 comments
Labels
question Further information is requested

Comments

@bryanlarsen
Copy link

A cache read by key now takes about ~30 seconds for my application.

A clue:

❯ sudo du -sh *
[sudo] password for blarsen: 
15M     content-v2
2.4G    index-v5
0       tmp

Usage pattern: write to a small number of keys (<10) every few seconds. On program start, read those keys.

The cache is used to dump state to disk so that it can be read on program start after unclean exit.

The index file for each key is about 280M, over 1M entries.

It appears that you're keeping the entire history? Is this just for reliability reasons, because there doesn't appear to be an API to read older versions of a key. Is there a way to reliably trim history to get my speed back?

@zkat
Copy link
Owner

zkat commented May 5, 2024

Index files are append-only in order to preserve the high-parallelism invariant.

In the JS version of cacache, I wrote a "garbage collector" that could be run "offline" (aka, when you can reasonably guarantee single-process, single-thread access to the cache), and it would iterate over all entries and reduce them to their latest entry value.

You can pretty trivially write this yourself by using the functions in the index module, e.g. use cacache::index::ls to iterate over all the entries, cacache::index::delete, then use cacache::index::insert with a constructed WriteOpts (just for the options, you don't need to open a content file). It shouldn't be more than a few lines of code, in the end. LMK if you run into any trouble doing this!

@zkat zkat added the question Further information is requested label May 5, 2024
@bryanlarsen
Copy link
Author

that could be run "offline"

That's a massive caveat for our use case.

You can pretty trivially write this yourself by using the functions in the index module, e.g. use cacache::index::ls to iterate over all the entries, cacache::index::delete, then use cacache::index::insert with a constructed WriteOpts (just for the options, you don't need to open a content file). It shouldn't be more than a few lines of code, in the end. LMK if you run into any trouble doing this!

And there's a window between the delete and the insert where the index is not present. We were using cacache to protect us from power failures and similar failures.

@bryanlarsen
Copy link
Author

This doesn't seem to work:


        if let Some(index) = cacache::index::find_async(state_path.as_std_path(), key).await? {
            debug!(?index);
            if let Err(err) = cacache::index::delete_async(state_path.as_std_path(), key).await {
                error!(key, ?err, "could not vacuum");
            }

            let hash = index.integrity.clone();

            let mut write_opts = cacache::WriteOpts::new()
                .integrity(index.integrity)
                .time(index.time)
                .size(index.size)
                .metadata(index.metadata);
            if let Some(raw_metadata) = index.raw_metadata {
                write_opts = write_opts.raw_metadata(raw_metadata);
            }

            cacache::index::insert_async(state_path.as_std_path(), key, write_opts).await?;

            match cacache::read_hash(state_path, &hash).await {
                Ok(r) => Ok(Some(r)),
                Err(cacache::Error::EntryNotFound(..)) => Ok(None),
                Err(err) => Err(err).context("cacache hash not found"),
            }
        } else {
            Ok(None)
        }

It works as a cacache::read, but it doesn't seem to reduce index size. Any hints?

Thanks.

@bryanlarsen
Copy link
Author

At first glance, index::delete is an insert(null) which is a no-op?

@bryanlarsen
Copy link
Author

This works, but is neither pretty nor robust:

for filename in $(find foo/v0/state.cacache/index-v5 -type f ! -name "*.bak") ; do
        tail -2 ${filename} > ${filename}.trimmed
        cp ${filename} ${filename}.bak
        mv ${filename}.trimmed ${filename}
done

@zkat
Copy link
Owner

zkat commented May 6, 2024

oh duh. I forgot that delete just inserts a null.

Yeah, I think it would be nice to have built-in "vacuum"/GC support. I just haven't gotten around to it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

2 participants