feat: Add configuration options for PK chunking to help w/the initial sync of large/wide tables #52
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
I ran into two problems when syncing tables with the BULK api.
Note
table in one of our SF accounts must be associated w/a large number of other objects, b/c we get this error:InvalidBatch : Failed to process query: OPERATION_TOO_LARGE: exceeded 20000 distinct ids
Addressing the first problem
We found it helpful to enable PK chunking for specific tables from the get go, rather than wait for it to fail over to PK chunking after a query timeout. The tables in question are fairly large and have lots of columns, and we have to use PK chunking w/them elsewhere.
The existing code is supposed to fail over to PK chunking when a query timeout occurs, but it didn't in our case. I believe the behavior of the API may have changed, based on what the the original tap-salesforce is doing now (and that seems outdated too, 15 vs 30 retries).
I'd like to submit a separate PR to address the fail over problem.
Addressing the second one
We got the "OPERATION_TOO_LARGE" error with and without PK chunking. One of the solutions is to do a smaller query, so a smaller chunk size avoided the error.
Configuration options for PK chunking
I'll happily submit PR's to update the docs. I added 4 config options:
Intent
This is intended to be used during an initial sync for specific problematic tables, and then disabled for subsequent runs.