Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CUDA_ERROR_NO_DEVICE: no CUDA-capable device is detected #923

Open
soymintc opened this issue Jan 8, 2025 · 1 comment
Open

CUDA_ERROR_NO_DEVICE: no CUDA-capable device is detected #923

soymintc opened this issue Jan 8, 2025 · 1 comment

Comments

@soymintc
Copy link

soymintc commented Jan 8, 2025

Have you checked the FAQ? https://github.com/google/deepvariant/blob/r1.8/docs/FAQ.md:

Describe the issue:
Submitted job does not use GPU but only utilizes CPUs

Setup

Steps to reproduce:

  • Command:
INPUT_DIR="${PWD}/quickstart-testdata"
DATA_HTTP_DIR="https://storage.googleapis.com/deepvariant/quickstart-testdata"
OUTPUT_DIR="${PWD}/quickstart-output"
mkdir -p "${OUTPUT_DIR}"
sbatch --gres=gpu:1 --mem=24G --partition=componc_gpu --job-name=test --output=test.out --error=test.err --wrap="export CUDA_VISIBLE_DEVICES=0; export TF_FORCE_GPU_ALLOW_GROWTH=true; export LD_LIBRARY_PATH=/usr/local/cuda/lib64:/usr/local/nvidia/lib:/usr/local/nvidia/lib64:\$LD_LIBRARY_PATH; singularity run --nv -B /usr/lib/locale/:/usr/lib/locale/ -B /data1 -B /home ~/data1/singularity/sifs/deepvariant-1.8.0-gpu.sif /opt/deepvariant/bin/run_deepvariant --model_type WGS --vcf_stats_report true --ref ${INPUT_DIR}/ucsc.hg19.chr20.unittest.fasta --reads ${INPUT_DIR}/NA12878_S1.chr20.10_10p1mb.bam --regions chr20:10000000-10010000 --output_vcf ${OUTPUT_DIR}/output.vcf.gz --output_gvcf ${OUTPUT_DIR}/output.g.vcf.gz --intermediate_results_dir ${OUTPUT_DIR}/intermediate_results_dir --num_shards 2; echo 'CUDA_VISIBLE_DEVICES:' \$CUDA_VISIBLE_DEVICES; echo 'LD_LIBRARY_PATH:' \$LD_LIBRARY_PATH; nvidia-smi;"
  • Error trace: (if applicable)
2025-01-08 12:50:15.900768: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  AVX2 AVX512F AVX512_VNNI FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2025-01-08 12:50:16.282094: I tensorflow/core/util/port.cc:104] oneDNN custom operations are on. You may see slightly different numerical results due to floating-point round-off errors from different computation orders. To turn them off, set the environment variable `TF_ENABLE_ONEDNN_OPTS=0`.
2025-01-08 12:50:18.366498: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /usr/local/nvidia/lib:/usr/local/nvidia/lib64:/.singularity.d/libs
2025-01-08 12:50:18.369116: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /usr/local/nvidia/lib:/usr/local/nvidia/lib64:/.singularity.d/libs
2025-01-08 12:50:18.369250: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly.
I0108 12:50:21.480433 139804324811904 run_deepvariant.py:649] Re-using the directory for intermediate results in /data1/shahs3/users/chois7/tickets/gpu-deepvariant/quickstart-output/intermediate_results_dir
I0108 12:50:21.481722 139804324811904 run_deepvariant.py:847] env = {'SHELL': '/bin/bash', 'NV_LIBCUBLAS_VERSION': '11.11.3.6-1', 'NVIDIA_VISIBLE_DEVICES': 'all', 'NV_NVML_DEV_VERSION': '11.8.86-1', 'NV_CUDNN_PACKAGE_NAME': 'libcudnn8', 'SLURM_JOB_USER': 'chois7', 'SLURM_TASKS_PER_NODE': '1', 'SLURM_JOB_UID': '164064212', 'HISTCONTROL': 'ignoredups', 'NV_LIBNCCL_DEV_PACKAGE': 'libnccl-dev=2.15.5-1+cuda11.8', 'SLURM_TASK_PID': '3398772', 'CONDA_EXE': '/home/chois7/miniforge3/bin/conda', '_CE_M': '', 'NV_LIBNCCL_DEV_PACKAGE_VERSION': '2.15.5-1', 'PKG_CONFIG_PATH': '/home/chois7/packages/lib/pkgconfig:/home/chois7/packages/lib/pkgconfig:/home/chois7/packages/lib/pkgconfig:/home/chois7/packages/lib/pkgconfig:', 'SLURM_JOB_GPUS': '3', 'SLURM_LOCALID': '0', 'SLURM_SUBMIT_DIR': '/data1/shahs3/users/chois7/tickets/gpu-deepvariant', 'ONCOKB_API_KEY': 'c8b739f9-30bb-47d7-8dba-58eda2fcd3a2', 'HISTSIZE': '1000', 'HOSTNAME': 'iscf031', 'PYTHON_VERSION': '3.10', 'LANGUAGE': 'en', 'SINGULARITY_NAME': 'deepvariant-1.8.0-gpu.sif', 'SLURMD_NODENAME': 'iscf031', 'SLURM_JOB_START_TIME': '1736358615', 'TERMCAP': 'SC|xterm-256color|VT 100/ANSI X3.64 virtual terminal:\\\n\t:DO=\\E[%dB:LE=\\E[%dD:RI=\\E[%dC:UP=\\E[%dA:bs:bt=\\E[Z:\\\n\t:cd=\\E[J:ce=\\E[K:cl=\\E[H\\E[J:cm=\\E[%i%d;%dH:ct=\\E[3g:\\\n\t:do=^J:nd=\\E[C:pt:rc=\\E8:rs=\\Ec:sc=\\E7:st=\\EH:up=\\EM:\\\n\t:le=^H:bl=^G:cr=^M:it#8:ho=\\E[H:nw=\\EE:ta=^I:is=\\E)0:\\\n\t:li#51:co#209:am:xn:xv:LP:sr=\\EM:al=\\E[L:AL=\\E[%dL:\\\n\t:cs=\\E[%i%d;%dr:dl=\\E[M:DL=\\E[%dM:dc=\\E[P:DC=\\E[%dP:\\\n\t:im=\\E[4h:ei=\\E[4l:mi:IC=\\E[%d@:ks=\\E[?1h\\E=:\\\n\t:ke=\\E[?1l\\E>:vi=\\E[?25l:ve=\\E[34h\\E[?25h:vs=\\E[34l:\\\n\t:ti=\\E[?1049h:te=\\E[?1049l:us=\\E[4m:ue=\\E[24m:so=\\E[3m:\\\n\t:se=\\E[23m:mb=\\E[5m:md=\\E[1m:mh=\\E[2m:mr=\\E[7m:\\\n\t:me=\\E[m:ms:\\\n\t:Co#8:pa#64:AF=\\E[3%dm:AB=\\E[4%dm:op=\\E[39;49m:AX:\\\n\t:vb=\\Eg:G0:as=\\E(0:ae=\\E(B:\\\n\t:ac=\\140\\140aaffggjjkkllmmnnooppqqrrssttuuvvwwxxyyzz{{||}}~~..--++,,hhII00:\\\n\t:po=\\E[5i:pf=\\E[4i:Km=\\E[M:k0=\\E[10~:k1=\\EOP:k2=\\EOQ:\\\n\t:k3=\\EOR:k4=\\EOS:k5=\\E[15~:k6=\\E[17~:k7=\\E[18~:\\\n\t:k8=\\E[19~:k9=\\E[20~:k;=\\E[21~:F1=\\E[23~:F2=\\E[24~:\\\n\t:F3=\\E[1;2P:F4=\\E[1;2Q:F5=\\E[1;2R:F6=\\E[1;2S:\\\n\t:F7=\\E[15;2~:F8=\\E[17;2~:F9=\\E[18;2~:FA=\\E[19;2~:\\\n\t:FB=\\E[20;2~:FC=\\E[21;2~:FD=\\E[23;2~:FE=\\E[24;2~:kb=\x7f:\\\n\t:K2=\\EOE:kB=\\E[Z:kF=\\E[1;2B:kR=\\E[1;2A:*4=\\E[3;2~:\\\n\t:*7=\\E[1;2F:#2=\\E[1;2H:#3=\\E[2;2~:#4=\\E[1;2D:%c=\\E[6;2~:\\\n\t:%e=\\E[5;2~:%i=\\E[1;2C:kh=\\E[1~:@1=\\E[1~:kH=\\E[4~:\\\n\t:@7=\\E[4~:kN=\\E[6~:kP=\\E[5~:kI=\\E[2~:kD=\\E[3~:ku=\\EOA:\\\n\t:kd=\\EOB:kr=\\EOC:kl=\\EOD:km:', 'HYDRA_LAUNCHER_EXTRA_ARGS': '--external-launcher', 'JAVA_HOME': '/home/chois7/miniforge3/envs/py11/lib/jvm', 'NVIDIA_REQUIRE_CUDA': 'cuda>=11.8 brand=tesla,driver>=470,driver<471 brand=unknown,driver>=470,driver<471 brand=nvidia,driver>=470,driver<471 brand=nvidiartx,driver>=470,driver<471 brand=geforce,driver>=470,driver<471 brand=geforcertx,driver>=470,driver<471 brand=quadro,driver>=470,driver<471 brand=quadrortx,driver>=470,driver<471 brand=titan,driver>=470,driver<471 brand=titanrtx,driver>=470,driver<471', 'NV_LIBCUBLAS_DEV_PACKAGE': 'libcublas-dev-11-8=11.11.3.6-1', 'NV_NVTX_VERSION': '11.8.86-1', 'USER_PRINCIPAL_NAME': '[email protected]', 'JAVA_LD_LIBRARY_PATH': '/home/chois7/miniforge3/envs/py11/lib/jvm/lib/server', 'WINDOW': '10', 'NV_CUDA_CUDART_DEV_VERSION': '11.8.89-1', 'NV_LIBCUSPARSE_VERSION': '11.7.5.86-1', 'SLURM_CLUSTER_NAME': 'iris', 'SLURM_JOB_END_TIME': '1736365815', 'NV_LIBNPP_VERSION': '11.8.0.86-1', 'SLURM_CPUS_ON_NODE': '1', 'DV_GPU_BUILD': '1', 'SINGULARITY_ENVIRONMENT': '/.singularity.d/env/91-environment.sh', 'NCCL_VERSION': '2.15.5-1', 'SLURM_JOB_CPUS_PER_NODE': '1', 'XML_CATALOG_FILES': 'file:///home/chois7/miniforge3/envs/py11/etc/xml/catalog file:///etc/xml/catalog', 'LMOD_DIR': '/usr/share/lmod/lmod/libexec', 'EDITOR': 'vim', 'TF_FORCE_GPU_ALLOW_GROWTH': 'true', 'SLURM_GPUS_ON_NODE': '1', 'KRB5CCNAME': 'FILE:/tmp/krb5cc_164064212', 'PRTE_MCA_plm_slurm_args': '--external-launcher', 'PWD': '/data1/shahs3/users/chois7/tickets/gpu-deepvariant', 'SLURM_GTIDS': '0', 'ISABL_CLIENT_ID': '3', 'GSETTINGS_SCHEMA_DIR': '/home/chois7/miniforge3/envs/py11/share/glib-2.0/schemas', 'DA_SESSION_ID_AUTH': '42ca2176-265a-204c-b0bf-68914c952524', 'LOGNAME': 'chois7', 'CONDA_PREFIX': '/home/chois7/miniforge3/envs/py11', 'NV_CUDNN_PACKAGE': 'libcudnn8=8.9.6.50-1+cuda11.8', 'SLURM_JOB_PARTITION': 'componc_gpu', 'MODULESHOME': '/usr/share/lmod/lmod', 'NVIDIA_DRIVER_CAPABILITIES': 'compute,utility', 'ISABL_API_URL': 'https://isabl.shahlab.mskcc.org/api/v1/', 'MANPATH': '/usr/share/lmod/lmod/share/man:', 'NXF_SINGULARITY_CACHEDIR': '/data1/shahs3/users/chois7/tmp/.cache', 'NV_NVPROF_DEV_PACKAGE': 'cuda-nvprof-11-8=11.8.87-1', 'NV_LIBNPP_PACKAGE': 'libnpp-11-8=11.8.0.86-1', 'TF_ENABLE_ONEDNN_OPTS': '1', 'GSETTINGS_SCHEMA_DIR_CONDA_BACKUP': '', 'NV_LIBNCCL_DEV_PACKAGE_NAME': 'libnccl-dev', 'SLURM_JOB_NUM_NODES': '1', 'SCREENDIR': '/home/chois7/.screen', 'SLURM_JOBID': '10976583', 'NV_LIBCUBLAS_DEV_VERSION': '11.11.3.6-1', 'NVIDIA_PRODUCT_NAME': 'CUDA', 'I_MPI_HYDRA_BOOTSTRAP_EXEC_EXTRA_ARGS': '--external-launcher', 'SLURM_JOB_QOS': 'normal', 'USER_PATH': '/home/chois7/packages/bin:/data1/shahs3/users/chois7/packages/annovar:/data1/shahs3/users/chois7/packages/node-v20.11.1-linux-x64/bin:/home/chois7/miniforge3/envs/py11/bin:/home/chois7/miniforge3/bin:/home/chois7/.cargo/bin:/home/chois7/packages/bin:/data1/shahs3/users/chois7/packages/annovar:/data1/shahs3/users/chois7/packages/node-v20.11.1-linux-x64/bin:/home/chois7/miniforge3/bin:/home/chois7/packages/bin:/data1/shahs3/users/chois7/packages/annovar:/data1/shahs3/users/chois7/packages/node-v20.11.1-linux-x64/bin:/home/chois7/data1/envs/apps/bin:/home/chois7/miniforge3/bin:/home/chois7/miniforge3/condabin:/data1/shahs3/users/chois7/packages/google-cloud-sdk/bin:/home/chois7/packages/bin:/data1/shahs3/users/chois7/packages/annovar:/data1/shahs3/users/chois7/packages/node-v20.11.1-linux-x64/bin:/home/chois7/miniforge3/bin:/usr/local/bin:/usr/bin:/usr/local/sbin:/usr/sbin:/home/chois7/bin:/home/chois7/bin:/home/chois7/bin:/home/chois7/.local/bin:/home/chois7/bin:/home/chois7/.local/bin:/bin:/usr/bin:/sbin:/usr/sbin:/usr/local/bin:/usr/local/sbin', 'NV_LIBCUBLAS_DEV_PACKAGE_NAME': 'libcublas-dev-11-8', 'CAPSULE_LOG': 'none', 'NV_CUDA_CUDART_VERSION': '11.8.89-1', 'HOME': '/home/chois7', 'LANG': 'C', 'GITHUB_TOKEN': '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', 'LS_COLORS': 'di=00;94:ow=1;34:tw=1;33:fi=0:ln=32:pi=5:so=5:bd=5:cd=5:or=31:mi=0:ex=35:*.rpm=90', 'SLURM_PROCID': '0', 'CUDA_VERSION': '11.8.0', 'SINGULARITY_CONTAINER': '/home/chois7/data1/singularity/sifs/deepvariant-1.8.0-gpu.sif', 'NV_LIBCUBLAS_PACKAGE': 'libcublas-11-8=11.11.3.6-1', 'LMOD_SETTARG_FULL_SUPPORT': 'no', 'NV_CUDA_NSIGHT_COMPUTE_DEV_PACKAGE': 'cuda-nsight-compute-11-8=11.8.0-1', 'CONDA_PROMPT_MODIFIER': '(py11) ', 'TMPDIR': '/data1/shahs3/users/chois7/tmp/.cache', 'PROMPT_COMMAND': 'PS1="Singularity> "; unset PROMPT_COMMAND', 'SLURM_TOPOLOGY_ADDR': 'iscf031', 'LMOD_VERSION': '8.7.32', 'SSH_CONNECTION': '10.18.25.140 53770 10.247.112.114 22', 'NV_LIBNPP_DEV_PACKAGE': 'libnpp-dev-11-8=11.8.0.86-1', 'PIP_CACHE_DIR': '/data1/shahs3/users/chois7/tmp/.cache', 'NV_LIBCUBLAS_PACKAGE_NAME': 'libcublas-11-8', 'XDG_CACHE_HOME': '/data1/shahs3/users/chois7/tmp/.cache', 'HYDRA_BOOTSTRAP': 'slurm', 'NV_LIBNPP_DEV_VERSION': '11.8.0.86-1', 'MODULEPATH_ROOT': '/usr/share/modulefiles', 'CUDA_VISIBLE_DEVICES': '0', 'JAVA_LD_LIBRARY_PATH_BACKUP': '/home/chois7/miniforge3/envs/py11/lib/jvm/lib/server', 'SLURM_TOPOLOGY_ADDR_PATTERN': 'node', 'LMOD_PKG': '/usr/share/lmod/lmod', 'SLURM_MEM_PER_NODE': '24576', 'TERM': 'xterm-256color', 'NV_LIBCUSPARSE_DEV_VERSION': '11.7.5.86-1', '_CE_CONDA': '', 'LESSOPEN': '||/usr/bin/lesspipe.sh %s', 'USER': 'chois7', 'CDC_PREW2KHOST': 'islogin01', 'CDC_JOINED_SITE': 'SDC', 'LIBRARY_PATH': '/usr/local/cuda/lib64/stubs', 'NV_CUDNN_VERSION': '8.9.6.50', 'SLURM_NODELIST': 'iscf031', 'CDC_JOINED_ZONE': 'CN=IRIS,CN=SDC_Zone,CN=MSK_Digits_HPC_Zone,CN=Zones,OU=Centrify,OU=HPC,OU=Resources,DC=MSKCC,DC=ROOT,DC=MSKCC,DC=ORG', 'ENVIRONMENT': 'BATCH', 'CONDA_SHLVL': '6', 'SLURM_JOB_ACCOUNT': 'shahs3', 'SLURM_PRIO_PROCESS': '0', 'LMOD_ROOT': '/usr/share/lmod', 'SHLVL': '4', 'SLURM_NNODES': '1', 'BASH_ENV': '/usr/share/lmod/lmod/init/bash', 'CDC_JOINED_DOMAIN': 'mskcc.root.mskcc.org', 'NV_CUDA_LIB_VERSION': '11.8.0-1', 'NVARCH': 'x86_64', 'LMOD_sys': 'Linux', 'LC_MESSAGES': 'C', 'SINGULARITY_BIND': '/usr/lib/locale/:/usr/lib/locale/,/data1,/home', 'NV_CUDNN_PACKAGE_DEV': 'libcudnn8-dev=8.9.6.50-1+cuda11.8', 'SLURM_SUBMIT_HOST': 'islogin01.mskcc.org', 'NV_CUDA_COMPAT_PACKAGE': 'cuda-compat-11-8', 'CONDA_PYTHON_EXE': '/home/chois7/miniforge3/bin/python', 'NV_LIBNCCL_PACKAGE': 'libnccl2=2.15.5-1+cuda11.8', 'LD_LIBRARY_PATH': '/usr/local/nvidia/lib:/usr/local/nvidia/lib64:/.singularity.d/libs', 'LC_CTYPE': 'en_US.utf8', 'SLURM_JOB_ID': '10976583', 'SLURM_NODEID': '0', 'SSH_CLIENT': '10.18.25.140 53770 22', 'CDC_JOINED_DC': 'vsstgpmaddns1.mskcc.root.mskcc.org', 'CONDA_DEFAULT_ENV': 'py11', 'NV_CUDA_NSIGHT_COMPUTE_VERSION': '11.8.0-1', 'JAVA_HOME_CONDA_BACKUP': '/home/chois7/miniforge3/envs/py11/lib/jvm', 'which_declare': 'declare -f', 'NV_NVPROF_VERSION': '11.8.87-1', 'SLURM_CONF': '/etc/slurm/slurm.conf', 'PATH': '/usr/local/nvidia/bin:/usr/local/cuda/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/opt/conda/bin:/opt/conda/envs/bio/bin:/opt/deepvariant/bin', 'STY': '847481.iris', 'SLURM_JOB_NAME': 'test', 'MODULEPATH': '/etc/modulefiles:/usr/share/modulefiles:/admin/software/lmod/modulefiles', 'NV_LIBNCCL_PACKAGE_NAME': 'libnccl2', 'VERSION': '1.8.0', 'NV_LIBNCCL_PACKAGE_VERSION': '2.15.5-1', 'LMOD_CMD': '/usr/share/lmod/lmod/libexec/lmod', 'MAIL': '/var/spool/mail/chois7', 'SSH_TTY': '/dev/pts/51', 'CDC_LOCALHOST': 'islogin01.mskcc.org', 'CONDA_PREFIX_1': '/home/chois7/miniforge3', 'CONDA_PREFIX_2': '/home/chois7/miniforge3/envs/py11', 'CONDA_PREFIX_3': '/home/chois7/miniforge3', 'CONDA_PREFIX_4': '/home/chois7/miniforge3/envs/py11', 'OMPI_MCA_plm_slurm_args': '--external-launcher', 'SINGULARITY_COMMAND': 'run', 'CONDA_PREFIX_5': '/home/chois7/miniforge3', 'SLURM_JOB_GID': '164064212', 'OLDPWD': '/home/chois7/data1/projects/signatures-ont/deepsomatic', 'SLURM_JOB_NODELIST': 'iscf031', 'I_MPI_HYDRA_BOOTSTRAP': 'slurm', 'BASH_FUNC_ml%%': '() {  eval "$($LMOD_DIR/ml_cmd "$@")"\n}', 'BASH_FUNC_which%%': '() {  ( alias;\n eval ${which_declare} ) | /usr/bin/which --tty-only --read-alias --read-functions --show-tilde --show-dot $@\n}', 'BASH_FUNC_module%%': '() {  if [ -z "${LMOD_SH_DBG_ON+x}" ]; then\n case "$-" in \n *v*x*)\n __lmod_sh_dbg=\'vx\'\n ;;\n *v*)\n __lmod_sh_dbg=\'v\'\n ;;\n *x*)\n __lmod_sh_dbg=\'x\'\n ;;\n esac;\n fi;\n if [ -n "${__lmod_sh_dbg:-}" ]; then\n set +$__lmod_sh_dbg;\n echo "Shell debugging temporarily silenced: export LMOD_SH_DBG_ON=1 for Lmod\'s output" 1>&2;\n fi;\n eval "$($LMOD_CMD shell "$@")" && eval "$(${LMOD_SETTARG_CMD:-:} -s sh)";\n __lmod_my_status=$?;\n if [ -n "${__lmod_sh_dbg:-}" ]; then\n echo "Shell debugging restarted" 1>&2;\n set -$__lmod_sh_dbg;\n fi;\n unset __lmod_sh_dbg;\n return $__lmod_my_status\n}', '_': '/usr/bin/python3', 'TPU_ML_PLATFORM': 'Tensorflow', 'TF2_BEHAVIOR': '1', 'TF_CPP_MIN_LOG_LEVEL': '1'}

Within the stdout, these warning messages appears:

2025-01-08 13:06:30.723603: E tensorflow/compiler/xla/stream_executor/cuda/cuda_driver.cc:267] failed call to cuInit: CUDA_ERROR_NO_DEVICE: no CUDA-capable device is detected
2025-01-08 13:06:35.873709: E tensorflow/compiler/xla/stream_executor/cuda/cuda_driver.cc:267] failed call to cuInit: CUDA_ERROR_NO_DEVICE: no CUDA-capable device is detected
2025-01-08 13:06:47.534398: E tensorflow/compiler/xla/stream_executor/cuda/cuda_driver.cc:1278] could not retrieve CUDA device count: CUDA_ERROR_NOT_INITIALIZED: initialization error

Does the quick start test work on your system?
Please test with https://github.com/google/deepvariant/blob/r0.10/docs/deepvariant-quick-start.md.
Is there any way to reproduce the issue by using the quick start?

  • This produces the outputs, but only by using the CPUs, when I monitored the GPU usage in the GPU cluster using nvidia-smi.

Any additional context:

@kishwarshafin
Copy link
Collaborator

@soymintc can you run through this: https://docs.sylabs.io/guides/latest/user-guide/gpu.html and see if you can replicate?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants