GzipFile.readinto reads full file before copying into the provided buffer #128646

effigies · 2025-01-08T20:01:31Z

Bug report

Bug description:

gzip.GzipFile uses the BufferedIOBase implementation of .readinto(), which simply calls .read and copies the result into a buffer. This negates the purpose of using .readinto() at all.

This may be considered more a missed optimization than a bug, but it is being reported in downstream tools and I've traced it back to CPython.

import os
from gzip import GzipFile

n_mbs = 50

with GzipFile('test.gz', mode='wb') as fobj:
    for _ in range(n_mbs):
        fobj.write(os.urandom(2**20))

buffer = bytearray(n_mbs * 2**20)

with GzipFile('test.gz', mode='rb') as fobj:
    fobj.readinto(buffer)

memray load_file.py
memray flamegraph memray-*.bin && rm memray-*.bin

Current memory profile

Duration: 0:00:01.821000
Total number of allocations: 5064
Total number of frames seen: 85
Peak memory usage: 116.3 MiB
Python allocator: pymalloc

Patched memory profile

Duration: 0:00:01.828000
Total number of allocations: 3317
Total number of frames seen: 79
Peak memory usage: 66.2 MiB
Python allocator: pymalloc

Patch

diff --git a/Lib/gzip.py b/Lib/gzip.py
index 1a3c82ce7e0..21bb4b085fd 100644
--- a/Lib/gzip.py
+++ b/Lib/gzip.py
@@ -338,6 +338,20 @@ def read1(self, size=-1):
             size = io.DEFAULT_BUFFER_SIZE
         return self._buffer.read1(size)
 
+    def readinto(self, b):
+        self._check_not_closed()
+        if self.mode != READ:
+            import errno
+            raise OSError(errno.EBADF, "readinto() on write-only GzipFile object")
+        return self._buffer.readinto(b)
+
+    def readinto1(self, b):
+        self._check_not_closed()
+        if self.mode != READ:
+            import errno
+            raise OSError(errno.EBADF, "readinto1() on write-only GzipFile object")
+        return self._buffer.readinto1(b)
+
     def peek(self, n):
         self._check_not_closed()
         if self.mode != READ:

I believe this should be an uncontroversial patch, so I will open a PR immediately.

cc @psadil

CPython versions tested on:

3.9, 3.10, 3.11, 3.12, 3.13, CPython main branch

Operating systems tested on:

Linux

Linked PRs

gh-128646: Implement GzipFile.readinto() functions #128647

The text was updated successfully, but these errors were encountered:

effigies added the type-bug An unexpected behavior, bug, or error label Jan 8, 2025

effigies mentioned this issue Jan 8, 2025

gh-128646: Implement GzipFile.readinto() functions #128647

Open

effigies added a commit to effigies/cpython that referenced this issue Jan 8, 2025

pythongh-128646: Implement GzipFile.readinto() functions

0530957

effigies added a commit to effigies/cpython that referenced this issue Jan 8, 2025

pythongh-128646: Implement GzipFile.readinto() functions

2ac657d

ZeroIntensity added the stdlib Python modules in the Lib dir label Jan 8, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

GzipFile.readinto reads full file before copying into the provided buffer #128646

GzipFile.readinto reads full file before copying into the provided buffer #128646

effigies commented Jan 8, 2025 •

edited by bedevere-app bot

Loading

GzipFile.readinto reads full file before copying into the provided buffer #128646

GzipFile.readinto reads full file before copying into the provided buffer #128646

Comments

effigies commented Jan 8, 2025 • edited by bedevere-app bot Loading

Bug report

Bug description:

CPython versions tested on:

Operating systems tested on:

Linked PRs

effigies commented Jan 8, 2025 •

edited by bedevere-app bot

Loading