-
Notifications
You must be signed in to change notification settings - Fork 12
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Cache object stores per scheme + bucket #99
base: main
Are you sure you want to change the base?
Conversation
@@ -4,5 +4,6 @@ trap "echo 'Caught termination signal. Exiting...'; exit 0" SIGINT SIGTERM | |||
|
|||
# create azurite container | |||
az storage container create -n $AZURE_TEST_CONTAINER_NAME --connection-string $AZURE_STORAGE_CONNECTION_STRING | |||
az storage container create -n ${AZURE_TEST_CONTAINER_NAME}2 --connection-string $AZURE_STORAGE_CONNECTION_STRING |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
created one more container to test object store cache per (schema, container) pair.
39bbefe
to
17d4aa8
Compare
let expiration_warn_msg = | ||
format!("credentials for {bucket} expired at {expire_at:?}"); | ||
|
||
let expiration_hint = "New credentials will be automatically retrieved if configured.\n\ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if new credentials will be automatically retrieved, do we need to throw a warning?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
wanted to give hint to refresh the session but seems like a bad spot. (retry should not warn if it succeeds)
|
||
if let Some(item) = self.cache.get(&key) { | ||
if item.expired(&key.bucket) { | ||
self.cache.remove(&key); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
does this release all the memory associated with the object store?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yep, impl Drop
is called for each field in the structs.
b8f9061
to
80f9a11
Compare
17d4aa8
to
9d90243
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
doing it once per session is still not very ideal. Is there an easy way to share across sessions?
b8856c4
to
852b6af
Compare
In this PR, we cache object stores per scheme (e.g. s3) + bucket (container) combination in each Postgres session. This will reduce authentication costs by only doing it at the first time. object_store does not perform sts assume_role to get temp token, so pg_parquet make use of aws sdk to perform it. And then configure object_store with the temp token that it fetched. This is why, pg_parquet checks expiration of the tokens for s3 store. It will fetch the temp token if it expires. (obviously if you configured temp token auth via config) For azure blob store, we do not need to recreate tokens because object_store handles it.
9d90243
to
a493016
Compare
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #99 +/- ##
==========================================
+ Coverage 91.82% 91.88% +0.06%
==========================================
Files 77 77
Lines 10023 10320 +297
==========================================
+ Hits 9204 9483 +279
- Misses 819 837 +18 ☔ View full report in Codecov by Sentry. |
Not easily doable as we have We may try to persist temp tokens across a postgres shared memory hash map. Object stores (client connections inside them) would be created per PG session (once per bucket) but we would reuse the temp tokens across the sessions. TL;DR; seems hard to share connections but we may try to share temp tokens across the sessions |
In this PR, we cache object stores per scheme (e.g. s3) + bucket (or container) combination in each Postgres session. This will reduce authentication costs by only doing it at the first time.
object_store does not perform sts assume_role to get temp token, so pg_parquet make use of aws sdk to perform it. And then configure object_store with the temp token that it fetched. This is why, pg_parquet checks expiration of the tokens for s3 store. It will fetch the temp token if it expires. (obviously if you configured temp token auth via config)
For azure blob store, we do not need to recreate tokens because object_store handles it.
Closes #93