You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I am trying to mt train with nllb_dense_3b arch on A6000 GPU.
But I receive "CUDA out of memory" error immediately in step 1.
The dataset is small - it contains 1700 sentences in two languages.
I tried PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True, it did not solve the issue.
Is it possible to tune some parameters to solve this issue?
INFO fairseq2 - Running training on 1 device(s).
[01/16/25 13:55:04] ERROR fairseq2 - CUDA run out of memory. See the logged memory stats.
|===========================================================================|
| PyTorch CUDA memory summary, device ID 0 |
|---------------------------------------------------------------------------|
| CUDA OOMs: 1 | cudaMalloc retries: 2 |
|===========================================================================|
| Metric | Cur Usage | Peak Usage | Tot Alloc | Tot Freed |
|---------------------------------------------------------------------------|
| Allocated memory | 47142 MiB | 47142 MiB | 97762 MiB | 50620 MiB |
| from large pool | 47120 MiB | 47120 MiB | 95003 MiB | 47883 MiB |
| from small pool | 21 MiB | 339 MiB | 2759 MiB | 2737 MiB |
|---------------------------------------------------------------------------|
| Active memory | 47142 MiB | 47142 MiB | 97762 MiB | 50620 MiB |
| from large pool | 47120 MiB | 47120 MiB | 95003 MiB | 47883 MiB |
| from small pool | 21 MiB | 339 MiB | 2759 MiB | 2737 MiB |
|---------------------------------------------------------------------------|
| Requested memory | 47140 MiB | 47140 MiB | 97603 MiB | 50463 MiB |
| from large pool | 47118 MiB | 47118 MiB | 94845 MiB | 47726 MiB |
| from small pool | 21 MiB | 339 MiB | 2758 MiB | 2736 MiB |
|---------------------------------------------------------------------------|
| GPU reserved memory | 48266 MiB | 48270 MiB | 48510 MiB | 249856 KiB |
| from large pool | 48146 MiB | 48146 MiB | 48146 MiB | 0 KiB |
| from small pool | 120 MiB | 364 MiB | 364 MiB | 249856 KiB |
|---------------------------------------------------------------------------|
| Non-releasable memory | 1123 MiB | 3553 MiB | 106534 MiB | 105410 MiB |
| from large pool | 1025 MiB | 3491 MiB | 103461 MiB | 102436 MiB |
| from small pool | 98 MiB | 109 MiB | 3072 MiB | 2974 MiB |
|---------------------------------------------------------------------------|
| Allocations | 3763 | 3969 | 15318 | 11555 |
| from large pool | 1394 | 1394 | 3766 | 2372 |
| from small pool | 2369 | 2785 | 11552 | 9183 |
|---------------------------------------------------------------------------|
| Active allocs | 3763 | 3969 | 15318 | 11555 |
| from large pool | 1394 | 1394 | 3766 | 2372 |
| from small pool | 2369 | 2785 | 11552 | 9183 |
|---------------------------------------------------------------------------|
| GPU reserved segments | 1183 | 1293 | 1305 | 122 |
| from large pool | 1123 | 1123 | 1123 | 0 |
| from small pool | 60 | 182 | 182 | 122 |
|---------------------------------------------------------------------------|
| Non-releasable allocs | 383 | 476 | 6113 | 5730 |
| from large pool | 233 | 233 | 1898 | 1665 |
| from small pool | 150 | 319 | 4215 | 4065 |
|---------------------------------------------------------------------------|
| Oversize allocations | 0 | 0 | 0 | 0 |
|---------------------------------------------------------------------------|
| Oversize GPU segments | 0 | 0 | 0 | 0 |
|===========================================================================|
Traceback (most recent call last):
File
"/home/sortm/miniconda3/envs/fairseq2/lib/python3.12/site-packages/fairseq2/recipes/__init_
_.py", line 40, in main
exit_code = _run()
^^^^^^
File
"/home/sortm/miniconda3/envs/fairseq2/lib/python3.12/site-packages/fairseq2/recipes/__init_
_.py", line 79, in _run
return cli.run()
^^^^^^^^^
File
"/home/sortm/miniconda3/envs/fairseq2/lib/python3.12/site-packages/fairseq2/recipes/cli.py"
, line 159, in run
return args.command.run(args) # type: ignore[no-any-return]
^^^^^^^^^^^^^^^^^^^^^^
File
"/home/sortm/miniconda3/envs/fairseq2/lib/python3.12/site-packages/fairseq2/recipes/cli.py"
, line 379, in run
return self._handler.run(self._parser, args)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File
"/home/sortm/miniconda3/envs/fairseq2/lib/python3.12/site-packages/fairseq2/recipes/cli.py"
, line 530, in run
program.run(args)
File
"/home/sortm/miniconda3/envs/fairseq2/lib/python3.12/site-packages/fairseq2/recipes/cli.py"
, line 658, in run
self._runner.run(config, output_dir)
File
"/home/sortm/miniconda3/envs/fairseq2/lib/python3.12/site-packages/fairseq2/recipes/runner.
py", line 90, in run
recipe()
File
"/home/sortm/miniconda3/envs/fairseq2/lib/python3.12/site-packages/fairseq2/recipes/trainer
.py", line 629, in __call__
self._do_run()
File
"/home/sortm/miniconda3/envs/fairseq2/lib/python3.12/site-packages/fairseq2/recipes/trainer
.py", line 683, in _do_run
self._run_step()
File
"/home/sortm/miniconda3/envs/fairseq2/lib/python3.12/site-packages/fairseq2/recipes/trainer
.py", line 778, in _run_step
_, scale_result = self._loss_scaler.run_optimizer_step(step_nr)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File
"/home/sortm/miniconda3/envs/fairseq2/lib/python3.12/site-packages/fairseq2/optim/_dynamic_
loss_scaler.py", line 170, in run_optimizer_step
loss = self._grad_scaler.step(self._optimizer, closure)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File
"/home/sortm/miniconda3/envs/fairseq2/lib/python3.12/site-packages/torch/amp/grad_scaler.py
", line 457, in step
retval = self._maybe_opt_step(optimizer, optimizer_state, *args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File
"/home/sortm/miniconda3/envs/fairseq2/lib/python3.12/site-packages/torch/amp/grad_scaler.py
", line 352, in _maybe_opt_step
retval = optimizer.step(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File
"/home/sortm/miniconda3/envs/fairseq2/lib/python3.12/site-packages/torch/optim/lr_scheduler
.py", line 137, in wrapper
return func.__get__(opt, opt.__class__)(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File
"/home/sortm/miniconda3/envs/fairseq2/lib/python3.12/site-packages/torch/optim/optimizer.py
", line 487, in wrapper
out = func(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^
File
"/home/sortm/miniconda3/envs/fairseq2/lib/python3.12/site-packages/fairseq2/optim/_optimize
r.py", line 38, in step
self._do_step()
File
"/home/sortm/miniconda3/envs/fairseq2/lib/python3.12/site-packages/fairseq2/optim/_adamw.py
", line 164, in _do_step
self._init_param(
File
"/home/sortm/miniconda3/envs/fairseq2/lib/python3.12/site-packages/fairseq2/optim/_adamw.py
", line 271, in _init_param
state["exp_avg_sq"] = torch.zeros_like(param)
^^^^^^^^^^^^^^^^^^^^^^^
torch.OutOfMemoryError: CUDA out of memory. Tried to allocate 16.00 MiB. GPU 0 has a total
capacity of 47.53 GiB of which 13.88 MiB is free. Including non-PyTorch memory, this
process has 47.46 GiB memory in use. Of the allocated memory 46.04 GiB is allocated by
PyTorch, and 1.10 GiB is reserved by PyTorch but unallocated. If reserved but unallocated
memory is large try setting PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True to avoid
fragmentation. See documentation for Memory Management
(https://pytorch.org/docs/stable/notes/cuda.html#environment-variables)
The text was updated successfully, but these errors were encountered:
I am trying to
mt train
with nllb_dense_3b arch on A6000 GPU.But I receive "CUDA out of memory" error immediately in step 1.
The dataset is small - it contains 1700 sentences in two languages.
I tried
PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True
, it did not solve the issue.Is it possible to tune some parameters to solve this issue?
The text was updated successfully, but these errors were encountered: