You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
From 9/12 meeting notes:
Recap: It is useful to shard optimizer state across devices (to save significant memory). This reflects current practice. We want to support it. We don’t want to support arbitrary model parallelism.
Sourabh: We could allow model-agnostic model parameter sharding.
Michael: We still want to ensure that the frameworks are comparative.
Proposal: Switch from no sharding to naive model parameter sharding. Switch from pmap to jit in JAX and allow optimizer state sharding (that follows the model parameter sharding) in both frameworks.
Forbid (in the rules) any hacks that change the model parallelization strategy.
Have workload-default sharding. Allow submitters to opt-out of it on a per-workload basis.
It is useful to shard optimizer state across devices (to save significant memory). This reflects current practice. We want to support it.
The text was updated successfully, but these errors were encountered: