ZIP Extraction¶
Extractor(file)
¶
High-level, streaming ZIP extractor.
Wraps AsyncZipFile and exposes a simplified interface for extracting ZIP
contents as a stream of Chunk objects.
Extraction is sequential and streaming. Common safety checks are supported, including MIME-based filtering, file size limits, and basic ZIP bomb protection.
extract(*, content_types=None, max_compression_ratio=None, max_file_size=None, chunk_size=None)
async
¶
Extracts files from the ZIP archive as a stream of chunks.
Files are processed sequentially. Each file is opened, validated against the provided constraints, streamed as chunks, and closed before the next entry is processed.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
content_types
|
Sequence[str] | None
|
Optional list of allowed MIME types. Only files whose extensions match these types are extracted. |
None
|
max_compression_ratio
|
float | None
|
Maximum allowed ratio of uncompressed size to compressed size. Used as basic ZIP bomb protection. |
None
|
max_file_size
|
int | None
|
Maximum allowed uncompressed file size in bytes. |
None
|
chunk_size
|
int | None
|
Optional chunk size for streaming file contents. |
None
|
Yields:
| Type | Description |
|---|---|
ContentStream
|
Chunk objects containing file path, content bytes, and byte offsets. |
Raises:
| Type | Description |
|---|---|
ValueError
|
If a file exceeds size or compression constraints. |
Note
Filenames are normalized using normalize_filename to handle
potentially invalid, inconsistent, or unsafe names stored in ZIP
archives.
AsyncZipFile(file, *, timeout=None)
¶
Asynchronous reader for ZIP archives.
Provides controlled, non-blocking access to ZIP files by opening and closing
the underlying zipfile.ZipFile in worker threads.
A strict open/close lifecycle is enforced. Resource cleanup is safe under cancellation. Archive entries are exposed through an asynchronous interface.
open()
async
¶
Opens the ZIP archive asynchronously.
The archive is initialized in a worker thread to avoid blocking the Trio event loop.
Raises:
| Type | Description |
|---|---|
RuntimeError
|
If the archive is already open or opening is in progress. |
OpeningAbortError
|
If the open operation is aborted. |
BaseException
|
Any unexpected error during ZIP initialization. |
files()
async
¶
Iterates over file entries in the ZIP archive.
Directory entries are skipped. Each yielded object represents a single file inside the archive and must be opened explicitly before reading.
Yields:
| Type | Description |
|---|---|
AsyncGenerator[AsyncZipMember, None]
|
AsyncZipMember instances for each non-directory entry. |
Raises:
| Type | Description |
|---|---|
RuntimeError
|
If the ZIP archive has not been opened. |
AsyncZipMember(info, file)
¶
Asynchronous wrapper around a single file inside a ZIP archive.
Provides an async-friendly interface for opening, reading, and closing individual archive members.
The underlying file handle is opened in a worker thread and guarded by explicit state tracking. Reads are protected by a lock to prevent concurrent access. Cleanup is safe under cancellation.
A member must be opened before reading and should be closed explicitly, or via an async context manager, once processing is complete.
open()
async
¶
Opens the ZIP member asynchronously.
The underlying file object is obtained in a worker thread to avoid blocking the Trio event loop.
Raises:
| Type | Description |
|---|---|
RuntimeError
|
If the member is already open or opening is in progress. |
OpeningAbortError
|
If the open operation is aborted. |
BaseException
|
Any unexpected error raised while opening the file. |
chunks(chunk_size=None)
async
¶
Streams the contents of the ZIP member as byte chunks.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
chunk_size
|
int | None
|
Maximum number of bytes per chunk. If not provided, the file is read until EOF in a single call. |
None
|
Yields:
| Type | Description |
|---|---|
AsyncGenerator[bytes, None]
|
Byte chunks read from the ZIP member. |
Raises:
| Type | Description |
|---|---|
RuntimeError
|
If the member was not opened before reading. |
OpeningAbortError
|
If the opening process was aborted. |
OpeningAbortError
¶
Bases: Exception
Raised when an asynchronous ZIP open operation is aborted.
Indicates that an attempt to open a ZIP file or one of its members did not complete successfully and was explicitly marked as aborted.