-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add support for KLL sketch aggregation in minion jobs #14702
base: master
Are you sure you want to change the base?
Conversation
.../java/org/apache/pinot/core/segment/processing/aggregator/PercentileKLLSketchAggregator.java
Outdated
Show resolved
Hide resolved
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## master #14702 +/- ##
============================================
+ Coverage 61.75% 63.82% +2.07%
- Complexity 207 1608 +1401
============================================
Files 2436 2704 +268
Lines 133233 150664 +17431
Branches 20636 23273 +2637
============================================
+ Hits 82274 96163 +13889
- Misses 44911 47316 +2405
- Partials 6048 7185 +1137
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the contribution @raghavagrawal! I've left a few (mostly minor) comments.
*/ | ||
public class PercentileKLLSketchAggregator implements ValueAggregator { | ||
|
||
protected static final int DEFAULT_K_VALUE = 200; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can be removed here as you've added DEFAULT_KLL_SKETCH_K
to the Helix common constants.
// K is set to 200, for tradeoffs see datasketches library documentation: | ||
// https://datasketches.apache.org/docs/KLL/KLLAccuracyAndSize.html#:~: | ||
public static final int DEFAULT_KLL_SKETCH_K = 200; | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you also update PercentileKLLAggregationFunction
to use this default and remove the duplicate default from there?
@@ -32,4 +32,5 @@ private Constants() { | |||
public static final String THETA_TUPLE_SKETCH_NOMINAL_ENTRIES = "nominalEntries"; | |||
public static final String PERCENTILETDIGEST_COMPRESSION_FACTOR_KEY = "compressionFactor"; | |||
public static final String SUMPRECISION_PRECISION_KEY = "precision"; | |||
public static final String KLL_DOUBLE_SKETCH_K_VALUE = "kValue"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe we can simply call this K
as that's what DataSketches seems to use - https://datasketches.apache.org/docs/KLL/KLLAccuracyAndSize.html#:~:?
// If the functionParameters don't have an explicit K value set, | ||
// use the default value for K |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: can be a single line comment, the max line length is 120.
if (first != null) { | ||
union.merge(first); | ||
} | ||
if (second != null) { | ||
union.merge(second); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When would these values be null
?
CR Description
Issue details
#14548
Testing details
Manual testing done by creating new table in local setup: https://docs.google.com/document/d/1N6F9zF39YSGvGPENzhLwxpJQq54WpKn1hSsDnYU-jPo/edit?usp=sharing