-
Notifications
You must be signed in to change notification settings - Fork 68
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Adjust the Workload's Runtime Budgets #836
Comments
External Tuning RulesetThe main motivation for reducing the runtime budgets is to save compute resources. With the following plot, we can look at how much compute we can save (y-axis), without (substantially) affecting the benchmark results. Plot details: The x-axis investigates hypothetical runtime budget cuts at given percentages of the original budget. They range from 100% (use the original budget) to the smallest possible one (i.e. the fastest a submission reached the target on a single ruin). The blue line denotes the total required computation (on this workload) as a percentage of the original cost, i.e. at x=100% we always have 100% of the cost. The blue line will always be above the gray dashed line, which indicates the identity, since training runs stopped once they hit the target (or they failed with a NaN), thus reducing the total compute. I bolded my preferred option, but would be happy to discuss it. Criteo 1TBThis likely gives us four options (from least to most competitive):
fastMRIDue to our 4x rule (i.e. submissions get an infinite score when above 4x the fastest submission), we should reduce the budget to at least 52% (fastest submission: CASPR Adaptive with 13%). Given that the slowest submission took 33%, we have (roughly) the following two options:
ResNetDiscussed in a separate comment below ViTViT shows a very large spread between the fastest submission and the baseline. We could thus consider the following (two options):
ConformerHere, I see the following options:
DeepSpeech
OGBGDue to our 4x rule (i.e. submissions get an infinite score when above 4x the fastest submission), we should reduce the budget to at least 48% (fastest submission: CASPR Adaptive with 12%). However, since the fastest submission might not necessarily be part of the next round of submissions, we could also consider 72% (since Shampoo required 18%). I see the following options:
WMT
|
ResNetHere, reducing the budget is not really an option. However, we did consider increasing the budget as currently, only a single submission hit the target (across both rulesets). Increasing the budget slightly, e.g. by 5%, could result in them hitting the target, giving us a less binary score for ResNet. Hitting the target on ResNet would result in a benchmark score increase of roughly 0.125 (i.e. 1/8 since the workload score on ResNet is now roughly the same as the fastest -> |
Thanks for preparing the hard numbers here! I am in agreement on all suggestions. As for ResNet, as I mentioned in the meeting, increasing the budget 5% will result in more repeatable and lower variance results across the board, which I am in favor of. Right now the performance profiles are very dependent on the seed values used for the runs on ResNet, which is undesirable. |
Based on the inaugural AlgoPerf competition results, we believe we can adjust the per-workload runtime budgets. Mostly to reduce the required computational resources, without significantly affecting the meaningfulness of the results.
Normalized submission runtimes across workloads
External Tuning
Self-Tuning
The text was updated successfully, but these errors were encountered: