CUDA out of memory in mt training task #976

epiniguin · 2025-01-16T12:04:08Z

I am trying to mt train with nllb_dense_3b arch on A6000 GPU.
But I receive "CUDA out of memory" error immediately in step 1.
The dataset is small - it contains 1700 sentences in two languages.

I tried PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True, it did not solve the issue.

Is it possible to tune some parameters to solve this issue?

INFO     fairseq2 - Running training on 1 device(s).                                                
[01/16/25 13:55:04] ERROR    fairseq2 - CUDA run out of memory. See the logged memory stats.                            
                             |===========================================================================|              
                             |                  PyTorch CUDA memory summary, device ID 0                 |              
                             |---------------------------------------------------------------------------|              
                             |            CUDA OOMs: 1            |        cudaMalloc retries: 2         |              
                             |===========================================================================|              
                             |        Metric         | Cur Usage  | Peak Usage | Tot Alloc  | Tot Freed  |              
                             |---------------------------------------------------------------------------|              
                             | Allocated memory      |  47142 MiB |  47142 MiB |  97762 MiB |  50620 MiB |              
                             |       from large pool |  47120 MiB |  47120 MiB |  95003 MiB |  47883 MiB |              
                             |       from small pool |     21 MiB |    339 MiB |   2759 MiB |   2737 MiB |              
                             |---------------------------------------------------------------------------|              
                             | Active memory         |  47142 MiB |  47142 MiB |  97762 MiB |  50620 MiB |              
                             |       from large pool |  47120 MiB |  47120 MiB |  95003 MiB |  47883 MiB |              
                             |       from small pool |     21 MiB |    339 MiB |   2759 MiB |   2737 MiB |              
                             |---------------------------------------------------------------------------|              
                             | Requested memory      |  47140 MiB |  47140 MiB |  97603 MiB |  50463 MiB |              
                             |       from large pool |  47118 MiB |  47118 MiB |  94845 MiB |  47726 MiB |              
                             |       from small pool |     21 MiB |    339 MiB |   2758 MiB |   2736 MiB |              
                             |---------------------------------------------------------------------------|              
                             | GPU reserved memory   |  48266 MiB |  48270 MiB |  48510 MiB | 249856 KiB |              
                             |       from large pool |  48146 MiB |  48146 MiB |  48146 MiB |      0 KiB |              
                             |       from small pool |    120 MiB |    364 MiB |    364 MiB | 249856 KiB |              
                             |---------------------------------------------------------------------------|              
                             | Non-releasable memory |   1123 MiB |   3553 MiB | 106534 MiB | 105410 MiB |              
                             |       from large pool |   1025 MiB |   3491 MiB | 103461 MiB | 102436 MiB |              
                             |       from small pool |     98 MiB |    109 MiB |   3072 MiB |   2974 MiB |              
                             |---------------------------------------------------------------------------|              
                             | Allocations           |    3763    |    3969    |   15318    |   11555    |              
                             |       from large pool |    1394    |    1394    |    3766    |    2372    |              
                             |       from small pool |    2369    |    2785    |   11552    |    9183    |              
                             |---------------------------------------------------------------------------|              
                             | Active allocs         |    3763    |    3969    |   15318    |   11555    |              
                             |       from large pool |    1394    |    1394    |    3766    |    2372    |              
                             |       from small pool |    2369    |    2785    |   11552    |    9183    |              
                             |---------------------------------------------------------------------------|              
                             | GPU reserved segments |    1183    |    1293    |    1305    |     122    |              
                             |       from large pool |    1123    |    1123    |    1123    |       0    |              
                             |       from small pool |      60    |     182    |     182    |     122    |              
                             |---------------------------------------------------------------------------|              
                             | Non-releasable allocs |     383    |     476    |    6113    |    5730    |              
                             |       from large pool |     233    |     233    |    1898    |    1665    |              
                             |       from small pool |     150    |     319    |    4215    |    4065    |              
                             |---------------------------------------------------------------------------|              
                             | Oversize allocations  |       0    |       0    |       0    |       0    |              
                             |---------------------------------------------------------------------------|              
                             | Oversize GPU segments |       0    |       0    |       0    |       0    |              
                             |===========================================================================|              
                             Traceback (most recent call last):                                                         
                               File                                                                                     
                             "/home/sortm/miniconda3/envs/fairseq2/lib/python3.12/site-packages/fairseq2/recipes/__init_
                             _.py", line 40, in main                                                                    
                                 exit_code = _run()                                                                     
                                             ^^^^^^                                                                     
                               File                                                                                     
                             "/home/sortm/miniconda3/envs/fairseq2/lib/python3.12/site-packages/fairseq2/recipes/__init_
                             _.py", line 79, in _run                                                                    
                                 return cli.run()                                                                       
                                        ^^^^^^^^^                                                                       
                               File                                                                                     
                             "/home/sortm/miniconda3/envs/fairseq2/lib/python3.12/site-packages/fairseq2/recipes/cli.py"
                             , line 159, in run                                                                         
                                 return args.command.run(args)  # type: ignore[no-any-return]                           
                                        ^^^^^^^^^^^^^^^^^^^^^^                                                          
                               File                                                                                     
                             "/home/sortm/miniconda3/envs/fairseq2/lib/python3.12/site-packages/fairseq2/recipes/cli.py"
                             , line 379, in run                                                                         
                                 return self._handler.run(self._parser, args)                                           
                                        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^                                           
                               File                                                                                     
                             "/home/sortm/miniconda3/envs/fairseq2/lib/python3.12/site-packages/fairseq2/recipes/cli.py"
                             , line 530, in run                                                                         
                                 program.run(args)                                                                      
                               File                                                                                     
                             "/home/sortm/miniconda3/envs/fairseq2/lib/python3.12/site-packages/fairseq2/recipes/cli.py"
                             , line 658, in run                                                                         
                                 self._runner.run(config, output_dir)                                                   
                               File                                                                                     
                             "/home/sortm/miniconda3/envs/fairseq2/lib/python3.12/site-packages/fairseq2/recipes/runner.
                             py", line 90, in run                                                                       
                                 recipe()                                                                               
                               File                                                                                     
                             "/home/sortm/miniconda3/envs/fairseq2/lib/python3.12/site-packages/fairseq2/recipes/trainer
                             .py", line 629, in __call__                                                                
                                 self._do_run()                                                                         
                               File                                                                                     
                             "/home/sortm/miniconda3/envs/fairseq2/lib/python3.12/site-packages/fairseq2/recipes/trainer
                             .py", line 683, in _do_run                                                                 
                                 self._run_step()                                                                       
                               File                                                                                     
                             "/home/sortm/miniconda3/envs/fairseq2/lib/python3.12/site-packages/fairseq2/recipes/trainer
                             .py", line 778, in _run_step                                                               
                                 _, scale_result = self._loss_scaler.run_optimizer_step(step_nr)                        
                                                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^                        
                               File                                                                                     
                             "/home/sortm/miniconda3/envs/fairseq2/lib/python3.12/site-packages/fairseq2/optim/_dynamic_
                             loss_scaler.py", line 170, in run_optimizer_step                                           
                                 loss = self._grad_scaler.step(self._optimizer, closure)                                
                                        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^                                
                               File                                                                                     
                             "/home/sortm/miniconda3/envs/fairseq2/lib/python3.12/site-packages/torch/amp/grad_scaler.py
                             ", line 457, in step                                                                       
                                 retval = self._maybe_opt_step(optimizer, optimizer_state, *args, **kwargs)             
                                          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^             
                               File                                                                                     
                             "/home/sortm/miniconda3/envs/fairseq2/lib/python3.12/site-packages/torch/amp/grad_scaler.py
                             ", line 352, in _maybe_opt_step                                                            
                                 retval = optimizer.step(*args, **kwargs)                                               
                                          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^                                               
                               File                                                                                     
                             "/home/sortm/miniconda3/envs/fairseq2/lib/python3.12/site-packages/torch/optim/lr_scheduler
                             .py", line 137, in wrapper                                                                 
                                 return func.__get__(opt, opt.__class__)(*args, **kwargs)                               
                                        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^                               
                               File                                                                                     
                             "/home/sortm/miniconda3/envs/fairseq2/lib/python3.12/site-packages/torch/optim/optimizer.py
                             ", line 487, in wrapper                                                                    
                                 out = func(*args, **kwargs)                                                            
                                       ^^^^^^^^^^^^^^^^^^^^^                                                            
                               File                                                                                     
                             "/home/sortm/miniconda3/envs/fairseq2/lib/python3.12/site-packages/fairseq2/optim/_optimize
                             r.py", line 38, in step                                                                    
                                 self._do_step()                                                                        
                               File                                                                                     
                             "/home/sortm/miniconda3/envs/fairseq2/lib/python3.12/site-packages/fairseq2/optim/_adamw.py
                             ", line 164, in _do_step                                                                   
                                 self._init_param(                                                                      
                               File                                                                                     
                             "/home/sortm/miniconda3/envs/fairseq2/lib/python3.12/site-packages/fairseq2/optim/_adamw.py
                             ", line 271, in _init_param                                                                
                                 state["exp_avg_sq"] = torch.zeros_like(param)                                          
                                                       ^^^^^^^^^^^^^^^^^^^^^^^                                          
                             torch.OutOfMemoryError: CUDA out of memory. Tried to allocate 16.00 MiB. GPU 0 has a total 
                             capacity of 47.53 GiB of which 13.88 MiB is free. Including non-PyTorch memory, this       
                             process has 47.46 GiB memory in use. Of the allocated memory 46.04 GiB is allocated by     
                             PyTorch, and 1.10 GiB is reserved by PyTorch but unallocated. If reserved but unallocated  
                             memory is large try setting PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True to avoid      
                             fragmentation.  See documentation for Memory Management                                    
                             (https://pytorch.org/docs/stable/notes/cuda.html#environment-variables)

The text was updated successfully, but these errors were encountered:

epiniguin added the question Further information is requested label Jan 16, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

CUDA out of memory in mt training task #976

CUDA out of memory in mt training task #976

epiniguin commented Jan 16, 2025

CUDA out of memory in mt training task #976

CUDA out of memory in mt training task #976

Comments

epiniguin commented Jan 16, 2025