Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adjust the Workload's Runtime Budgets #836

Open
fsschneider opened this issue Jan 20, 2025 · 3 comments
Open

Adjust the Workload's Runtime Budgets #836

fsschneider opened this issue Jan 20, 2025 · 3 comments
Assignees
Labels
🛑 AlgoPerf Leaderboard Blocking rolling AlgoPerf Leaderboard 👷 In Progress Issue is being worked on

Comments

@fsschneider
Copy link
Contributor

fsschneider commented Jan 20, 2025

Based on the inaugural AlgoPerf competition results, we believe we can adjust the per-workload runtime budgets. Mostly to reduce the required computational resources, without significantly affecting the meaningfulness of the results.

Normalized submission runtimes across workloads

External Tuning

CRITEO 1TB FASTMRI RESNET VIT CONFORMER DEEPSPEECH OGBG WMT
Amos inf 0.33 inf 0.65 0.71 0.57 0.60 0.68
Baseline 0.94 0.23 inf 0.91 0.90 0.65 0.42 0.86
CASPR Adaptive NaN 0.13 inf 0.58 inf 0.75 0.12 0.67
Cyclic LR 0.67 0.25 inf 0.81 0.94 0.70 0.38 0.49
Generalized Adam 0.83 0.18 0.97 0.84 inf 0.68 0.31 0.63
LAWA EMA 0.69 0.29 inf 0.80 inf inf 0.57 0.89
LAWA Queue inf 0.22 inf 0.66 inf inf 0.25 0.56
NadamP 0.80 0.22 inf 0.88 0.94 0.60 0.43 0.80
Schedule Free AdamW 0.67 0.13 inf 0.57 0.92 0.78 0.29 0.33
Schedule Free Prodigy NaN 0.21 inf inf inf inf 0.61 inf
PyTorch Distr. Shampoo 0.65 0.15 inf 0.43 0.78 0.62 0.18 0.80

Self-Tuning

CRITEO 1TB FASTMRI RESNET VIT CONFORMER DEEPSPEECH OGBG WMT
AdamG inf inf inf inf inf inf inf inf
Baseline 0.75 0.22 inf 0.95 0.94 0.65 0.46 0.84
NadamW Sequential 2.96 0.27 inf 1.58 inf 1.45 0.55 2.36
Schedule Free AdamW 0.75 0.15 inf 0.68 0.97 0.88 0.32 0.94
Sinv6 NaN 0.49 inf inf inf 2.47 1.35 2.32
Sinv6 75 NaN 0.45 inf inf inf 2.21 1.50 1.82
@fsschneider fsschneider added 👷 In Progress Issue is being worked on 🛑 AlgoPerf Leaderboard Blocking rolling AlgoPerf Leaderboard labels Jan 20, 2025
@fsschneider fsschneider self-assigned this Jan 20, 2025
@fsschneider
Copy link
Contributor Author

External Tuning Ruleset

The main motivation for reducing the runtime budgets is to save compute resources. With the following plot, we can look at how much compute we can save (y-axis), without (substantially) affecting the benchmark results.

Plot details: The x-axis investigates hypothetical runtime budget cuts at given percentages of the original budget. They range from 100% (use the original budget) to the smallest possible one (i.e. the fastest a submission reached the target on a single ruin). The blue line denotes the total required computation (on this workload) as a percentage of the original cost, i.e. at x=100% we always have 100% of the cost. The blue line will always be above the gray dashed line, which indicates the identity, since training runs stopped once they hit the target (or they failed with a NaN), thus reducing the total compute.
The (orange) vertical lines indicate the median per-workload runtime scores by submissions, with the winner in the external tuning ruleset (Shampoo) in green and the baseline in black. They can give us a sense at which points reducing the budgets would affect the results.

I bolded my preferred option, but would be happy to discuss it.
Note: The compute reductions are computed per workload. For the overall compute, it is more important to save compute on resource-intensive workloads.

Criteo 1TB

Image

This likely gives us four options (from least to most competitive):

  1. Don't reduce the budget.
  2. Reduce to 95%, saving roughly 3% compute. No submission would be affected.
  3. Reduce to 85%, saving roughly 11% compute. The Baseline would receive an infinite score.
  4. Reduce to 70%, saving roughly 24% compute. The Baseline, Generalized Adam, and NadamP would receive an infinite score.

fastMRI

Image

Due to our 4x rule (i.e. submissions get an infinite score when above 4x the fastest submission), we should reduce the budget to at least 52% (fastest submission: CASPR Adaptive with 13%). Given that the slowest submission took 33%, we have (roughly) the following two options:

  1. Reduce to 50%, saving roughly 27%. No submission would be affected.
  2. Reduce to 35%, saving roughly 37%. No submission would be affected.
  3. Reduce to 25%, saving roughly 49%. Amos and LAWA EMA would receive an infinite score. (Cyclic LR requires exactly 25%).

ResNet

Discussed in a separate comment below

ViT

Image

ViT shows a very large spread between the fastest submission and the baseline. We could thus consider the following (two options):

  1. Reduce to 95%, saving roughly 3%. No submission would be affected.
  2. Reduce to 85%, saving roughly 11%. The Baseline and NadamP would receive an infinite score.
  3. Reduce to 70%, saving roughly 25%. The Baseline, Cyclic LR, Generalized Adam, LAWA EMA, and NadamP would receive an infinite score.

Conformer

Image

Here, I see the following options:

  1. Don't reduce the budget.
  2. Reduce to 95%, saving roughly 4%. No submission would be affected.
  3. Reduce to 90%, saving roughly 9%. Cyclic LR, NadamP, and Schedule Free AdamW would receive an infinite score.

DeepSpeech

Image

  1. Reduce to 80%, saving roughly 16%. No submission would be affected.
  2. Reduce to 70%, saving roughly 24%. CASPR, Cyclic LR, and Schedule Free AdamW would receive an infinite score.

OGBG

Image

Due to our 4x rule (i.e. submissions get an infinite score when above 4x the fastest submission), we should reduce the budget to at least 48% (fastest submission: CASPR Adaptive with 12%). However, since the fastest submission might not necessarily be part of the next round of submissions, we could also consider 72% (since Shampoo required 18%). I see the following options:

  1. Reduce to 65%, saving roughly 26%. No submission would be affected.
  2. Reduce to 48%, saving roughly 41%. Amos, LAWA EMA, and Schedule Free Prodigy would receive an infinite score.

WMT

Image

  1. Reduce to 90%, saving roughly 6%. No submission would be affected.
  2. Reduce to 70%, saving roughly 23%. The Baseline, LAWA EMA, NadamP, and Shampoo would receive an infinite score.

@fsschneider
Copy link
Contributor Author

ResNet

Here, reducing the budget is not really an option. However, we did consider increasing the budget as currently, only a single submission hit the target (across both rulesets).
The following plots show that three additional submissions were really close to hitting the target reliably, the Baseline, NadamP, and Shampoo.

Image
Image
Image

Increasing the budget slightly, e.g. by 5%, could result in them hitting the target, giving us a less binary score for ResNet.

Hitting the target on ResNet would result in a benchmark score increase of roughly 0.125 (i.e. 1/8 since the workload score on ResNet is now roughly the same as the fastest -> $\tau = 1$, increasing the normalized performance profile integral by ~1/8). This is quite a significant increase, i.e. the Baseline would jump from 6th to 3rd place (very close to 2nd) or NadamP from 5th to 2nd.

@adefazio
Copy link
Contributor

Thanks for preparing the hard numbers here! I am in agreement on all suggestions. As for ResNet, as I mentioned in the meeting, increasing the budget 5% will result in more repeatable and lower variance results across the board, which I am in favor of. Right now the performance profiles are very dependent on the seed values used for the runs on ResNet, which is undesirable.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
🛑 AlgoPerf Leaderboard Blocking rolling AlgoPerf Leaderboard 👷 In Progress Issue is being worked on
Projects
None yet
Development

No branches or pull requests

2 participants