Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Class Paths Have Strange Behavior on 8.3.4 #13050

Open
4 tasks done
ilan-gold opened this issue Dec 11, 2024 · 2 comments
Open
4 tasks done

Class Paths Have Strange Behavior on 8.3.4 #13050

ilan-gold opened this issue Dec 11, 2024 · 2 comments
Labels
topic: collection related to the collection phase

Comments

@ilan-gold
Copy link

ilan-gold commented Dec 11, 2024

Hello! We are having an issue in our repo where 8.3.4 pytest (but not 8.3.3) caused our classes to have strange behavior at runtime with errors like:

        if (dest_type, src_type, modifiers) not in self.write:
>           raise IORegistryError._from_write_parts(dest_type, src_type, modifiers)
E           anndata._io.specs.registry.IORegistryError: No method registered for writing <class 'opt.hostedtoolcache.Python.3_12_7.x64.lib.python3_12.site-packages.anndata._core.views.AwkwardArrayView'> into <class 'h5py._hl.group.Group'>
E           Error raised while writing key 'awk_2d_ragged' of <class 'h5py._hl.group.Group'> to /obsm

when in reality, the AwkwardArrayView is in the self.write dictionary, but as
<class 'anndata._core.views.AwkwardArrayView'>:

Screenshot 2024-12-11 at 11 34 41

(This screenshot is from my attempt to fix the issue where I printed out the contents of self.write)

Also weirdly I cannot reproduce this locally. I have tried creating a local reproducer with an src/package/__init__.py file for a dummy package like:

d = {}

class Foo():
    pass

d[Foo] = True

def check(typ: type) -> bool:
    return d[typ]

__all__ = ["Foo", "check"]

with a single pytest

from package import Foo, check

def test_check():
    f = Foo()
    assert check(type(f))

But this seems to work when I run pytest instead of throwing a KeyError. I will continue to try to dig but I wanted to report.

See also my various attempted to fix the issue

Python Env:

anndata 0.11.1.dev18+g6594d93 anyio 4.6.2.post1 array-api-compat 1.9.1 asciitree 0.3.3 scikit-learn 1.5.2 scipy 1.14.1 seaborn 0.13.2 session-info 1.0.0 setuptools 75.6.0 setuptools-scm 8.1.0 six 1.16.0 sniffio 1.3.1 sortedcontainers 2.4.0 statsmodels 0.14.4 stdlib-list 0.11.0 tblib 3.0.0 textual 0.88.1 threadpoolctl 3.5.0 toolz 1.0.0 tornado 6.4.2 towncrier 24.8.0 tqdm 4.67.1 typing-extensions 4.12.2 tzdata 2024.2 uc-micro-py 1.0.3 umap-learn 0.5.7 urllib3 2.2.3 uv 0.5.5 zarr 2.18.3 zict 3.0.0

Compute Env:

Starting: Initialize job Agent name: 'Azure Pipelines 5' Agent machine name: 'fv-az360-809' Current agent version: '4.248.0' Downloading task: PublishPipelineMetadata (0.248.1) Checking job knob settings. Knob: DockerActionRetries = true Source: $(VSTSAGENT_DOCKER_ACTION_RETRIES) Knob: AgentToolsDirectory = /opt/hostedtoolcache Source: ${AGENT_TOOLSDIRECTORY} Knob: UseGitLongPaths = true Source: $(USE_GIT_LONG_PATHS) Knob: AgentPerflog = /home/vsts/perflog Source: ${VSTS_AGENT_PERFLOG} Knob: EnableIssueSourceValidation = true Source: $(ENABLE_ISSUE_SOURCE_VALIDATION) Knob: AgentEnablePipelineArtifactLargeChunkSize = true Source: $(AGENT_ENABLE_PIPELINEARTIFACT_LARGE_CHUNK_SIZE) Knob: ContinueAfterCancelProcessTreeKillAttempt = true Source: $(VSTSAGENT_CONTINUE_AFTER_CANCEL_PROCESSTREEKILL_ATTEMPT) Knob: ProcessHandlerSecureArguments = false Source: $(AZP_75787_ENABLE_NEW_LOGIC) Knob: ProcessHandlerSecureArguments = false Source: $(AZP_75787_ENABLE_NEW_LOGIC_LOG) Knob: ProcessHandlerTelemetry = true Source: $(AZP_75787_ENABLE_COLLECT) Knob: UseNewNodeHandlerTelemetry = True Source: $(DistributedTask.Agent.USENEWNODEHANDLERTELEMETRY) Knob: ProcessHandlerEnableNewLogic = true Source: $(AZP_75787_ENABLE_NEW_PH_LOGIC) Knob: EnableResourceMonitorDebugOutput = true Source: $(AZP_ENABLE_RESOURCE_MONITOR_DEBUG_OUTPUT) Knob: EnableResourceUtilizationWarnings = true Source: $(AZP_ENABLE_RESOURCE_UTILIZATION_WARNINGS) Knob: IgnoreVSTSTaskLib = true Source: $(AZP_AGENT_IGNORE_VSTSTASKLIB) Knob: FailJobWhenAgentDies = true Source: $(FAIL_JOB_WHEN_AGENT_DIES) Knob: CheckForTaskDeprecation = true Source: $(AZP_AGENT_CHECK_FOR_TASK_DEPRECATION) Knob: CheckIfTaskNodeRunnerIsDeprecated246 = True Source: $(DistributedTask.Agent.CheckIfTaskNodeRunnerIsDeprecated246) Knob: UseNode20ToStartContainer = True Source: $(DistributedTask.Agent.UseNode20ToStartContainer) Knob: LogTaskNameInUserAgent = true Source: $(AZP_AGENT_LOG_TASKNAME_IN_USERAGENT) Knob: UseFetchFilterInCheckoutTask = true Source: $(AGENT_USE_FETCH_FILTER_IN_CHECKOUT_TASK) Knob: Rosetta2Warning = true Source: $(ROSETTA2_WARNING) Knob: AddForceCredentialsToGitCheckout = True Source: $(DistributedTask.Agent.AddForceCredentialsToGitCheckout) Finished checking job knob settings. Start tracking orphan processes. Finishing: Initialize job
  • a detailed description of the bug or problem you are having
  • output of pip list from the virtual environment you are using
  • pytest and operating system versions
  • minimal example if possible (attempted)
@dongfangtianyu
Copy link
Contributor

Transferred Traceback over:

__ test_backed_raw_subset[sparray_bool_subset-sparray_bool_subset-csr_matrix] __

tmp_path = PosixPath('/tmp/pytest-of-vsts/pytest-0/test_backed_raw_subset_sparray28')
array_type = <class 'scipy.sparse._csr.csr_matrix'>
subset_func = <function sparray_bool_subset at 0x7febceb65c60>
subset_func2 = <function sparray_bool_subset at 0x7febceb65c60>

    @pytest.mark.parametrize(
        "array_type",
        [
            pytest.param(asarray, id="dense_array"),
            pytest.param(sparse.csr_matrix, id="csr_matrix"),
            pytest.param(sparse.csr_array, id="csr_array"),
        ],
    )
    def test_backed_raw_subset(tmp_path, array_type, subset_func, subset_func2):
        backed_pth = tmp_path / "backed.h5ad"
        final_pth = tmp_path / "final.h5ad"
        mem_adata = gen_adata((10, 10), X_type=array_type)
        mem_adata.raw = mem_adata
        obs_idx = subset_func(mem_adata.obs_names)
        var_idx = subset_func2(mem_adata.var_names)
       if (
            array_type is asarray
            and isinstance(obs_idx, list | np.ndarray | sparse.spmatrix | SpArray)
            and isinstance(var_idx, list | np.ndarray | sparse.spmatrix | SpArray)
        ):
            pytest.xfail(
                "Fancy indexing does not work with multiple arrays on a h5py.Dataset"
            )
        mem_adata.write(backed_pth)
    
        ### Backed view has same values as in memory view ###
        backed_adata = ad.read_h5ad(backed_pth, backed="r")
        backed_v = backed_adata[obs_idx, var_idx]
        assert backed_v.is_view
        mem_v = mem_adata[obs_idx, var_idx]
    
        # Value equivalent
        assert_equal(mem_v, backed_v)
        # Type and value equivalent
        assert_equal(mem_v.copy(), backed_v.to_memory(copy=True), exact=True)
        assert backed_v.is_view
        assert backed_v.isbacked
    
        ### Write from backed view ###
>       backed_v.write_h5ad(final_pth)

tests/test_backed_hdf5.py:225: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
/opt/hostedtoolcache/Python/3.12.7/x64/lib/python3.12/site-packages/anndata/_core/anndata.py:1865: in write_h5ad
    write_h5ad(
/opt/hostedtoolcache/Python/3.12.7/x64/lib/python3.12/site-packages/anndata/_io/h5ad.py:105: in write_h5ad
    write_elem(f, "obsm", dict(adata.obsm), dataset_kwargs=dataset_kwargs)
/opt/hostedtoolcache/Python/3.12.7/x64/lib/python3.12/site-packages/anndata/_io/specs/registry.py:488: in write_elem
    Writer(_REGISTRY).write_elem(store, k, elem, dataset_kwargs=dataset_kwargs)
/opt/hostedtoolcache/Python/3.12.7/x64/lib/python3.12/site-packages/anndata/_io/utils.py:248: in func_wrapper
    return func(*args, **kwargs)
/opt/hostedtoolcache/Python/3.12.7/x64/lib/python3.12/site-packages/anndata/_io/specs/registry.py:355: in write_elem
    return write_func(store, k, elem, dataset_kwargs=dataset_kwargs)
/opt/hostedtoolcache/Python/3.12.7/x64/lib/python3.12/site-packages/anndata/_io/specs/registry.py:71: in wrapper
    result = func(g, k, *args, **kwargs)
/opt/hostedtoolcache/Python/3.12.7/x64/lib/python3.12/site-packages/anndata/_io/specs/methods.py:353: in write_mapping
    _writer.write_elem(g, sub_k, sub_v, dataset_kwargs=dataset_kwargs)
/opt/hostedtoolcache/Python/3.12.7/x64/lib/python3.12/site-packages/anndata/_io/utils.py:248: in func_wrapper
    return func(*args, **kwargs)
/opt/hostedtoolcache/Python/3.12.7/x64/lib/python3.12/site-packages/anndata/_io/specs/registry.py:352: in write_elem
    write_func = self.find_write_func(dest_type, elem, modifiers)
/opt/hostedtoolcache/Python/3.12.7/x64/lib/python3.12/site-packages/anndata/_io/specs/registry.py:319: in find_write_func
    return self.registry.get_write(dest_type, type(elem), modifiers, writer=self)
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
self = <anndata._io.specs.registry.IORegistry object at 0x7febceaf3260>
dest_type = <class 'h5py._hl.group.Group'>
src_type = <class 'opt.hostedtoolcache.Python.3_12_7.x64.lib.python3_12.site-packages.anndata._core.views.AwkwardArrayView'>
modifiers = frozenset()

    def get_write(
        self,
        dest_type: type,
        src_type: type | tuple[type, str],
        modifiers: frozenset[str] = frozenset(),
        *,
        writer: Writer,
    ) -> Write:
        import h5py
    
        if dest_type is h5py.File:
            dest_type = h5py.Group
    
        if (dest_type, src_type, modifiers) not in self.write:
>           raise IORegistryError._from_write_parts(dest_type, src_type, modifiers)
E           anndata._io.specs.registry.IORegistryError: No method registered for writing <class 'opt.hostedtoolcache.Python.3_12_7.x64.lib.python3_12.site-packages.anndata._core.views.AwkwardArrayView'> into <class 'h5py._hl.group.Group'>
E           Error raised while writing key 'awk_2d_ragged' of <class 'h5py._hl.group.Group'> to /obsm

/opt/hostedtoolcache/Python/3.12.7/x64/lib/python3.12/site-packages/anndata/_io/specs/registry.py:135: IORegistryError

At first glance, the anomaly comes from within anndata. It requires people who are familiar with anndata to dig further.


self.write is dict:

{
...
(<class 'zarr.hierarchy.Group'>, <class 'anndata._core.views.AwkwardArrayView'>, frozenset()): <function write_awkward at 0x7fbae30dc680>,  
(<class 'h5py._hl.group.Group'>, <class 'anndata._core.views.AwkwardArrayView'>, frozenset()): <function write_awkward at 0x7fbae30dc5e0>,  
...}

and

dest_type = <class 'h5py._hl.group.Group'>
src_type = <class 'opt.hostedtoolcache.Python.3_12_7.x64.lib.python3_12.site-packages.anndata._core.views.AwkwardArrayView'>
modifiers = frozenset()

result:

 (dest_type, src_type, modifiers) in self.write is False

diff:

 <class 'anndata._core.views.AwkwardArrayView'>,

vs

<class 'opt.hostedtoolcache.Python.3_12_7.x64.lib.python3_12.site-packages.anndata._core.views.AwkwardArrayView'>

@dongfangtianyu
Copy link
Contributor

dongfangtianyu commented Dec 11, 2024

I guess something happened during the import process that resulted in a class receiving two types of import results: relative path and absolute path.
Boolean judgment is used to classify it as having different contents.

From the changelog, it seems to be related to #12592

@Zac-HD Zac-HD added the topic: collection related to the collection phase label Jan 9, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
topic: collection related to the collection phase
Projects
None yet
Development

No branches or pull requests

3 participants