Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for KLL sketch aggregation in minion jobs #14702

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

raghavagrawal
Copy link

CR Description

  1. Add support for KLL sketch aggregation in minion jobs
  2. Added new aggregator class PercentileKLLSketchAggregator.java for minion value aggregator.

Issue details

#14548

Testing details

Manual testing done by creating new table in local setup: https://docs.google.com/document/d/1N6F9zF39YSGvGPENzhLwxpJQq54WpKn1hSsDnYU-jPo/edit?usp=sharing

@codecov-commenter
Copy link

codecov-commenter commented Dec 30, 2024

Codecov Report

Attention: Patch coverage is 0% with 16 lines in your changes missing coverage. Please review.

Project coverage is 63.82%. Comparing base (59551e4) to head (baf6a61).
Report is 1517 commits behind head on master.

Files with missing lines Patch % Lines
...sing/aggregator/PercentileKLLSketchAggregator.java 0.00% 15 Missing ⚠️
.../processing/aggregator/ValueAggregatorFactory.java 0.00% 1 Missing ⚠️
Additional details and impacted files
@@             Coverage Diff              @@
##             master   #14702      +/-   ##
============================================
+ Coverage     61.75%   63.82%   +2.07%     
- Complexity      207     1608    +1401     
============================================
  Files          2436     2704     +268     
  Lines        133233   150664   +17431     
  Branches      20636    23273    +2637     
============================================
+ Hits          82274    96163   +13889     
- Misses        44911    47316    +2405     
- Partials       6048     7185    +1137     
Flag Coverage Δ
custom-integration1 100.00% <ø> (+99.99%) ⬆️
integration 100.00% <ø> (+99.99%) ⬆️
integration1 100.00% <ø> (+99.99%) ⬆️
integration2 0.00% <ø> (ø)
java-11 63.80% <0.00%> (+2.09%) ⬆️
java-21 63.68% <0.00%> (+2.06%) ⬆️
skip-bytebuffers-false 63.82% <0.00%> (+2.07%) ⬆️
skip-bytebuffers-true 63.66% <0.00%> (+35.93%) ⬆️
temurin 63.82% <0.00%> (+2.07%) ⬆️
unittests 63.82% <0.00%> (+2.07%) ⬆️
unittests1 56.25% <0.00%> (+9.36%) ⬆️
unittests2 34.15% <0.00%> (+6.42%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Copy link
Collaborator

@yashmayya yashmayya left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the contribution @raghavagrawal! I've left a few (mostly minor) comments.

*/
public class PercentileKLLSketchAggregator implements ValueAggregator {

protected static final int DEFAULT_K_VALUE = 200;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can be removed here as you've added DEFAULT_KLL_SKETCH_K to the Helix common constants.

Comment on lines +132 to +135
// K is set to 200, for tradeoffs see datasketches library documentation:
// https://datasketches.apache.org/docs/KLL/KLLAccuracyAndSize.html#:~:
public static final int DEFAULT_KLL_SKETCH_K = 200;

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you also update PercentileKLLAggregationFunction to use this default and remove the duplicate default from there?

@@ -32,4 +32,5 @@ private Constants() {
public static final String THETA_TUPLE_SKETCH_NOMINAL_ENTRIES = "nominalEntries";
public static final String PERCENTILETDIGEST_COMPRESSION_FACTOR_KEY = "compressionFactor";
public static final String SUMPRECISION_PRECISION_KEY = "precision";
public static final String KLL_DOUBLE_SKETCH_K_VALUE = "kValue";
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we can simply call this K as that's what DataSketches seems to use - https://datasketches.apache.org/docs/KLL/KLLAccuracyAndSize.html#:~:?

Comment on lines +51 to +52
// If the functionParameters don't have an explicit K value set,
// use the default value for K
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: can be a single line comment, the max line length is 120.

Comment on lines +59 to +64
if (first != null) {
union.merge(first);
}
if (second != null) {
union.merge(second);
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When would these values be null?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants