Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Connection Aborted Error when running PC-Relate on Hail #14788

Open
MiguelGuardado opened this issue Jan 13, 2025 · 1 comment
Open

Connection Aborted Error when running PC-Relate on Hail #14788

MiguelGuardado opened this issue Jan 13, 2025 · 1 comment
Labels
needs-triage A brand new issue that needs triaging.

Comments

@MiguelGuardado
Copy link

What happened?

Hello, I am trying to use hail to perform kinship analysis with PC-Relate on an HPC server. I was able to download all the dependencies needed to run the software (OpenJDK version 11, Recent version of C/C++ libraries, Blas/Lapack, and using python version 3.9). I am getting this connection aborted error when trying to run the pc-relate command and not sure what the cause of the issue could be. While I am unable to provide the exact vcf file I am using my code should be fairly simple, happy to provide a toy dataset if you believe it will help to solve the issue. I have a feeling the error related to come memory issue happening between python and java. Thank you for the work done to maintain this cool software!

Version

Version: 0.2.133

Relevant log output

[guardado075@dev2 run_hail]$ python3 hail_test.py 
Initializing Hail with default parameters...
SLF4J: Failed to load class "org.slf4j.impl.StaticLoggerBinder".
SLF4J: Defaulting to no-operation (NOP) logger implementation
SLF4J: See http://www.slf4j.org/codes.html#StaticLoggerBinder for further details.
Running on Apache Spark version 3.5.4
SparkUI available at http://dev2.wynton.ucsf.edu:4041
Welcome to
     __  __     <>__
    / /_/ /__  __/ /
   / __  / _ `/ / /
  /_/ /_/\_,_/_/_/   version 0.2.133-4c60fddb171a
LOGGING: writing to /wynton/home/hernandez/guardado075/HernandezLab/TOPMED/run_hail/hail-20250113-1350-0.2.133-4c60fddb171a.log
2025-01-13 13:50:16.383 Hail: INFO: scanning VCF for sortedness...
SLF4J: Failed to load class "org.slf4j.impl.StaticMDCBinder".
SLF4J: Defaulting to no-operation MDCAdapter implementation.
SLF4J: See http://www.slf4j.org/codes.html#no_static_mdc_binder for further details.
2025-01-13 13:50:20.635 Hail: INFO: Coerced prefix-sorted VCF, requiring additional sorting within data partitions on each query.
2025-01-13 13:50:25.327 Hail: INFO: hwe_normalize: found 183638 variants after filtering out monomorphic sites.
2025-01-13 13:50:31.220 Hail: INFO: pca: running PCA with 10 components...) / 1]
2025-01-13 13:51:19.689 Hail: INFO: wrote table with 0 rows in 0 partitions to /tmp/persist_TablepbJw35EqtX
2025-01-13 13:51:37.227 Hail: INFO: wrote matrix with 184113 rows and 148 columns as 45 blocks of size 4096 to /tmp/M8uRhdZ8k2zj7izOCWjVid
/usr/lib/jvm/java-11/bin/java: symbol lookup error: /tmp/jniloader18012358189010753208netlib-native_system-linux-x86_64.so: undefined symbol: cblas_dgemm
Traceback (most recent call last):
  File "/wynton/home/hernandez/guardado075/.local/lib/python3.11/site-packages/urllib3/connectionpool.py", line 787, in urlopen
    response = self._make_request(
               ^^^^^^^^^^^^^^^^^^^
  File "/wynton/home/hernandez/guardado075/.local/lib/python3.11/site-packages/urllib3/connectionpool.py", line 534, in _make_request
    response = conn.getresponse()
               ^^^^^^^^^^^^^^^^^^
  File "/wynton/home/hernandez/guardado075/.local/lib/python3.11/site-packages/urllib3/connection.py", line 516, in getresponse
    httplib_response = super().getresponse()
                       ^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib64/python3.11/http/client.py", line 1395, in getresponse
    response.begin()
  File "/usr/lib64/python3.11/http/client.py", line 325, in begin
    version, status, reason = self._read_status()
                              ^^^^^^^^^^^^^^^^^^^
  File "/usr/lib64/python3.11/http/client.py", line 294, in _read_status
    raise RemoteDisconnected("Remote end closed connection without"
http.client.RemoteDisconnected: Remote end closed connection without response

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/wynton/home/hernandez/guardado075/.local/lib/python3.11/site-packages/requests/adapters.py", line 667, in send
    resp = conn.urlopen(
           ^^^^^^^^^^^^^
  File "/wynton/home/hernandez/guardado075/.local/lib/python3.11/site-packages/urllib3/connectionpool.py", line 841, in urlopen
    retries = retries.increment(
              ^^^^^^^^^^^^^^^^^^
  File "/wynton/home/hernandez/guardado075/.local/lib/python3.11/site-packages/urllib3/util/retry.py", line 474, in increment
    raise reraise(type(error), error, _stacktrace)
          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/wynton/home/hernandez/guardado075/.local/lib/python3.11/site-packages/urllib3/util/util.py", line 38, in reraise
    raise value.with_traceback(tb)
  File "/wynton/home/hernandez/guardado075/.local/lib/python3.11/site-packages/urllib3/connectionpool.py", line 787, in urlopen
    response = self._make_request(
               ^^^^^^^^^^^^^^^^^^^
  File "/wynton/home/hernandez/guardado075/.local/lib/python3.11/site-packages/urllib3/connectionpool.py", line 534, in _make_request
    response = conn.getresponse()
               ^^^^^^^^^^^^^^^^^^
  File "/wynton/home/hernandez/guardado075/.local/lib/python3.11/site-packages/urllib3/connection.py", line 516, in getresponse
    httplib_response = super().getresponse()
                       ^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib64/python3.11/http/client.py", line 1395, in getresponse
    response.begin()
  File "/usr/lib64/python3.11/http/client.py", line 325, in begin
    version, status, reason = self._read_status()
                              ^^^^^^^^^^^^^^^^^^^
  File "/usr/lib64/python3.11/http/client.py", line 294, in _read_status
    raise RemoteDisconnected("Remote end closed connection without"
urllib3.exceptions.ProtocolError: ('Connection aborted.', RemoteDisconnected('Remote end closed connection without response'))

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/wynton/home/hernandez/guardado075/HernandezLab/TOPMED/run_hail/hail_test.py", line 5, in <module>
    rel = hl.pc_relate(ds.GT, 0.01, k=10, statistics='kin') 
          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "<decorator-gen-1726>", line 2, in pc_relate
  File "/wynton/home/hernandez/guardado075/.local/lib/python3.11/site-packages/hail/typecheck/check.py", line 585, in wrapper
    return __original_func(*args_, **kwargs_)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/wynton/home/hernandez/guardado075/.local/lib/python3.11/site-packages/hail/methods/relatedness/pc_relate.py", line 369, in pc_relate
    ).persist()
      ^^^^^^^^^
  File "<decorator-gen-1226>", line 2, in persist
  File "/wynton/home/hernandez/guardado075/.local/lib/python3.11/site-packages/hail/typecheck/check.py", line 585, in wrapper
    return __original_func(*args_, **kwargs_)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/wynton/home/hernandez/guardado075/.local/lib/python3.11/site-packages/hail/table.py", line 2761, in persist
    return Env.backend().persist(self)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/wynton/home/hernandez/guardado075/.local/lib/python3.11/site-packages/hail/backend/backend.py", line 292, in persist
    persisted = dataset.checkpoint(tempfile.__enter__())
                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "<decorator-gen-1216>", line 2, in checkpoint
  File "/wynton/home/hernandez/guardado075/.local/lib/python3.11/site-packages/hail/typecheck/check.py", line 585, in wrapper
    return __original_func(*args_, **kwargs_)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/wynton/home/hernandez/guardado075/.local/lib/python3.11/site-packages/hail/table.py", line 1963, in checkpoint
    self.write(output=output, overwrite=overwrite, stage_locally=stage_locally, _codec_spec=_codec_spec)
  File "<decorator-gen-1218>", line 2, in write
  File "/wynton/home/hernandez/guardado075/.local/lib/python3.11/site-packages/hail/typecheck/check.py", line 585, in wrapper
    return __original_func(*args_, **kwargs_)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/wynton/home/hernandez/guardado075/.local/lib/python3.11/site-packages/hail/table.py", line 2005, in write
    Env.backend().execute(
  File "/wynton/home/hernandez/guardado075/.local/lib/python3.11/site-packages/hail/backend/spark_backend.py", line 217, in execute
    raise err
  File "/wynton/home/hernandez/guardado075/.local/lib/python3.11/site-packages/hail/backend/spark_backend.py", line 209, in execute
    return super().execute(ir, timed)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/wynton/home/hernandez/guardado075/.local/lib/python3.11/site-packages/hail/backend/backend.py", line 179, in execute
    result, timings = self._rpc(ActionTag.EXECUTE, payload)
                      ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/wynton/home/hernandez/guardado075/.local/lib/python3.11/site-packages/hail/backend/py4j_backend.py", line 218, in _rpc
    resp = self._requests_session.post(f'http://localhost:{port}{path}', data=data)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/wynton/home/hernandez/guardado075/.local/lib/python3.11/site-packages/requests/sessions.py", line 637, in post
    return self.request("POST", url, data=data, json=json, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/wynton/home/hernandez/guardado075/.local/lib/python3.11/site-packages/requests/sessions.py", line 589, in request
    resp = self.send(prep, **send_kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/wynton/home/hernandez/guardado075/.local/lib/python3.11/site-packages/requests/sessions.py", line 703, in send
    r = adapter.send(request, **kwargs)
        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/wynton/home/hernandez/guardado075/.local/lib/python3.11/site-packages/requests/adapters.py", line 682, in send
    raise ConnectionError(err, request=request)
requests.exceptions.ConnectionError: ('Connection aborted.', RemoteDisconnected('Remote end closed connection without response'))
@MiguelGuardado MiguelGuardado added the needs-triage A brand new issue that needs triaging. label Jan 13, 2025
@MiguelGuardado
Copy link
Author

I apologize I didnt leave the simple script I am using to run pc-relate, I will leave it below

import hail as hl

ds = hl.import_vcf('prop_exome_full.vcf')

rel = hl.pc_relate(ds.GT, 0.01, k=10, statistics='kin')

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
needs-triage A brand new issue that needs triaging.
Projects
None yet
Development

No branches or pull requests

1 participant