Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GzipFile.readinto reads full file before copying into the provided buffer #128646

Open
effigies opened this issue Jan 8, 2025 · 0 comments
Open
Labels
stdlib Python modules in the Lib dir type-bug An unexpected behavior, bug, or error

Comments

@effigies
Copy link
Contributor

effigies commented Jan 8, 2025

Bug report

Bug description:

gzip.GzipFile uses the BufferedIOBase implementation of .readinto(), which simply calls .read and copies the result into a buffer. This negates the purpose of using .readinto() at all.

This may be considered more a missed optimization than a bug, but it is being reported in downstream tools and I've traced it back to CPython.

import os
from gzip import GzipFile

n_mbs = 50

with GzipFile('test.gz', mode='wb') as fobj:
    for _ in range(n_mbs):
        fobj.write(os.urandom(2**20))

buffer = bytearray(n_mbs * 2**20)

with GzipFile('test.gz', mode='rb') as fobj:
    fobj.readinto(buffer)
memray load_file.py
memray flamegraph memray-*.bin && rm memray-*.bin

Current memory profile

image

Duration: 0:00:01.821000
Total number of allocations: 5064
Total number of frames seen: 85
Peak memory usage: 116.3 MiB
Python allocator: pymalloc

Patched memory profile

image

Duration: 0:00:01.828000
Total number of allocations: 3317
Total number of frames seen: 79
Peak memory usage: 66.2 MiB
Python allocator: pymalloc

Patch

diff --git a/Lib/gzip.py b/Lib/gzip.py
index 1a3c82ce7e0..21bb4b085fd 100644
--- a/Lib/gzip.py
+++ b/Lib/gzip.py
@@ -338,6 +338,20 @@ def read1(self, size=-1):
             size = io.DEFAULT_BUFFER_SIZE
         return self._buffer.read1(size)
 
+    def readinto(self, b):
+        self._check_not_closed()
+        if self.mode != READ:
+            import errno
+            raise OSError(errno.EBADF, "readinto() on write-only GzipFile object")
+        return self._buffer.readinto(b)
+
+    def readinto1(self, b):
+        self._check_not_closed()
+        if self.mode != READ:
+            import errno
+            raise OSError(errno.EBADF, "readinto1() on write-only GzipFile object")
+        return self._buffer.readinto1(b)
+
     def peek(self, n):
         self._check_not_closed()
         if self.mode != READ:

I believe this should be an uncontroversial patch, so I will open a PR immediately.

cc @psadil

CPython versions tested on:

3.9, 3.10, 3.11, 3.12, 3.13, CPython main branch

Operating systems tested on:

Linux

Linked PRs

@effigies effigies added the type-bug An unexpected behavior, bug, or error label Jan 8, 2025
effigies added a commit to effigies/cpython that referenced this issue Jan 8, 2025
effigies added a commit to effigies/cpython that referenced this issue Jan 8, 2025
@ZeroIntensity ZeroIntensity added the stdlib Python modules in the Lib dir label Jan 8, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
stdlib Python modules in the Lib dir type-bug An unexpected behavior, bug, or error
Projects
None yet
Development

No branches or pull requests

2 participants