Skip to content

ZIP Extraction

Extractor(file)

High-level, streaming ZIP extractor.

Wraps AsyncZipFile and exposes a simplified interface for extracting ZIP contents as a stream of Chunk objects.

Extraction is sequential and streaming. Common safety checks are supported, including MIME-based filtering, file size limits, and basic ZIP bomb protection.

extract(*, content_types=None, max_compression_ratio=None, max_file_size=None, chunk_size=None) async

Extracts files from the ZIP archive as a stream of chunks.

Files are processed sequentially. Each file is opened, validated against the provided constraints, streamed as chunks, and closed before the next entry is processed.

Parameters:

Name Type Description Default
content_types Sequence[str] | None

Optional list of allowed MIME types. Only files whose extensions match these types are extracted.

None
max_compression_ratio float | None

Maximum allowed ratio of uncompressed size to compressed size. Used as basic ZIP bomb protection.

None
max_file_size int | None

Maximum allowed uncompressed file size in bytes.

None
chunk_size int | None

Optional chunk size for streaming file contents.

None

Yields:

Type Description
ContentStream

Chunk objects containing file path, content bytes, and byte offsets.

Raises:

Type Description
ValueError

If a file exceeds size or compression constraints.

Note

Filenames are normalized using normalize_filename to handle potentially invalid, inconsistent, or unsafe names stored in ZIP archives.

AsyncZipFile(file, *, timeout=None)

Asynchronous reader for ZIP archives.

Provides controlled, non-blocking access to ZIP files by opening and closing the underlying zipfile.ZipFile in worker threads.

A strict open/close lifecycle is enforced. Resource cleanup is safe under cancellation. Archive entries are exposed through an asynchronous interface.

open() async

Opens the ZIP archive asynchronously.

The archive is initialized in a worker thread to avoid blocking the Trio event loop.

Raises:

Type Description
RuntimeError

If the archive is already open or opening is in progress.

OpeningAbortError

If the open operation is aborted.

BaseException

Any unexpected error during ZIP initialization.

files() async

Iterates over file entries in the ZIP archive.

Directory entries are skipped. Each yielded object represents a single file inside the archive and must be opened explicitly before reading.

Yields:

Type Description
AsyncGenerator[AsyncZipMember, None]

AsyncZipMember instances for each non-directory entry.

Raises:

Type Description
RuntimeError

If the ZIP archive has not been opened.

AsyncZipMember(info, file)

Asynchronous wrapper around a single file inside a ZIP archive.

Provides an async-friendly interface for opening, reading, and closing individual archive members.

The underlying file handle is opened in a worker thread and guarded by explicit state tracking. Reads are protected by a lock to prevent concurrent access. Cleanup is safe under cancellation.

A member must be opened before reading and should be closed explicitly, or via an async context manager, once processing is complete.

open() async

Opens the ZIP member asynchronously.

The underlying file object is obtained in a worker thread to avoid blocking the Trio event loop.

Raises:

Type Description
RuntimeError

If the member is already open or opening is in progress.

OpeningAbortError

If the open operation is aborted.

BaseException

Any unexpected error raised while opening the file.

chunks(chunk_size=None) async

Streams the contents of the ZIP member as byte chunks.

Parameters:

Name Type Description Default
chunk_size int | None

Maximum number of bytes per chunk. If not provided, the file is read until EOF in a single call.

None

Yields:

Type Description
AsyncGenerator[bytes, None]

Byte chunks read from the ZIP member.

Raises:

Type Description
RuntimeError

If the member was not opened before reading.

OpeningAbortError

If the opening process was aborted.

OpeningAbortError

Bases: Exception

Raised when an asynchronous ZIP open operation is aborted.

Indicates that an attempt to open a ZIP file or one of its members did not complete successfully and was explicitly marked as aborted.