-
Notifications
You must be signed in to change notification settings - Fork 221
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
MGMT-19573: Track release stats in installercache #7156
MGMT-19573: Track release stats in installercache #7156
Conversation
@paul-maidment: This pull request references MGMT-19573 which is a valid jira issue. Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the task to target the "4.19.0" version, but no target version was set. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
@paul-maidment: This pull request references MGMT-19573 which is a valid jira issue. Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the task to target the "4.19.0" version, but no target version was set. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
/cc @carbonin |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: paul-maidment The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
971481b
to
ab81637
Compare
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## master #7156 +/- ##
==========================================
+ Coverage 67.58% 67.73% +0.14%
==========================================
Files 296 296
Lines 40235 40290 +55
==========================================
+ Hits 27193 27290 +97
+ Misses 10589 10544 -45
- Partials 2453 2456 +3
|
163f461
to
a242809
Compare
/retest |
7d19ddc
to
2b09f04
Compare
/retest |
/test okd-scos-e2e-aws-ovn |
Are you intending to track this through events or a metric? I could see something like installer cache misses and release usage duration as good metrics, but what you're doing here with timestamps feels more like an event. If you're intending to target only events you can probably implement it as an event only rather than adding something to the metrics manager. @rccrdpccl what do you think? Should these be events or actual prometheus metrics? |
My personal preference would be to use actual metrics as they would be more usable through more tools (like grafana), but maybe we also do something clever there with events that I don't know about 🤷 |
I can understand this perspective, using multiple events instead of a single event with embedded metrics. The current approach does have some benefits though...
In terms of complexity for use in a query to calculate "concurrent releases"
I feel that for our needs, the "metric events" approach is the simplest and easiest to work with. I understand the point about usability via more tools, again I argue that for the "concurrent releases" report that we would probably have to resort to SQL anyway so the choice of "pure events" vs "metric event" becomes less relevant. I am only interested in "overlapping time intervals" in these cases anyway. An additional metric to indicate "lifespan" of the request could be added to the metric event for more convenience. |
/test okd-scos-e2e-aws-ovn |
/test edge-e2e-ai-operator-ztp |
/test okd-scos-e2e-aws-ovn |
1 similar comment
/test okd-scos-e2e-aws-ovn |
It really depends on what question are we trying to answer.
Answering the first question with Prometheus can get tricky: if it's a gauge we might miss scraping the data due to scraping interval, if it's a counter it'd get complicated to handle and represent. The second question would also be really awkward to answer through Prometheus. However by querying events both implementation and actual answer should be straightforward. Another factor that we should consider when choosing between metrics and events are alerts: we can alert on metrics, we cannot on events. In this case I think it's not relevant.
100% agree. Metrics manager should serve as a facade for Prometheus metrics, and although it's coupled with events handler, IMO it really shouldn't and we should try to avoid that approach.
@paul-maidment why do we need to enable/disable this metric? I do not see the benefit in doing so |
|
I can understand this perspective. I will update this to use the events api directly. |
2b09f04
to
6701090
Compare
b01e34a
to
a5f40c7
Compare
/retest |
a5f40c7
to
56ad1b6
Compare
To clarify, 1000 events per week limit was derived from the following query against production Grafana.
For the last 7 days, that count is 804 This represents every time that cluster preparation was started. The main point is that we have no more than 1000 events in a given week, this is low volume so we are not at risk of an events explosion or anything of that nature. |
56ad1b6
to
d7747a7
Compare
/retest |
/hold |
d7747a7
to
a5489fd
Compare
/unhold |
/lgtm |
To support Ephemeral storage improvement efforts in MGMT-13917 It is desirable to have some statistics from the installer cache ReleaseId The release being downloaded Cached Was the release found in the cache (true) or did it need to be downloaded/extracted? (false) StartTime The start time of the install cache request EndTime The time at which the caller finished using the request ExtractDuration The time taken to extract the file in seconds (zero if no extraction took place) This wil be sent in a single metrics event, the event time of the event represents the time that the release was no longer needed by callers. All of the above should be sufficient to give visibility on the lifespan and concurrency of intstaller cache entries. In addition, it was requested that we are able to enable/disable the recording of metrics for this feature so I have implemented this. Implemented through the metricsManager as a list of metrics that we would like to block.
a5489fd
to
5d29c9b
Compare
/lgtm |
@paul-maidment: all tests passed! Full PR test history. Your PR dashboard. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here. |
[ART PR BUILD NOTIFIER] Distgit: ose-agent-installer-api-server |
To support Ephemeral storage improvement efforts in MGMT-13917 It is desirable to have some statistics from the installer cache
This wil be sent in a single metrics event, the event time of the event represents the time that the release was no longer needed by callers. All of the above should be sufficient to give visibility on the lifespan and concurrency of intstaller cache entries.
In addition, it was requested that we are able to enable/disable the recording of metrics for this feature so I have implemented this. Implemented through the metricsManager as a list of metrics that we would like to block.
List all the issues related to this PR
What environments does this code impact?
How was this code tested?
Checklist
docs
, README, etc)Reviewers Checklist