antyr¶
This project focuses on core crawling primitives: making HTTP requests, consuming responses as streams, and persisting streamed content with explicit, cancellation-safe lifetimes.
Unlike full-featured frameworks, antyr does not implement an end-to-end scraping pipeline. Parsing, extraction logic, data modeling, retries, scheduling, and storage are left to the caller.
Tip
If you want a batteries-included scraping framework, consider Scrapy.
Installation¶
Install via pip:
Or using uv:
The following packages will be installed alongside antyr:
trio-- structured concurrency runtimehttpx[socks]-- HTTP client with SOCKS proxy supportstem-- Tor control port integration
Quickstart¶
Fetch and process a response as a stream¶
Instead of buffering the entire response in memory, the response can be processed incrementally as it is received.
import trio
from antyr import HttpCrawler
async def main() -> None:
async with HttpCrawler("https://httpbin.org") as crawler:
stream = await crawler.fetch("/json").content_stream()
async for chunk in stream:
# process chunk
...
trio.run(main)
If the response body is an archive, it can be extracted before processing by calling extract(). The extracted content is exposed through the same streaming interface.
import trio
from antyr import HttpCrawler
async def main() -> None:
async with HttpCrawler("https://example.com") as crawler:
stream = await crawler.fetch("/archive.zip").extract()
async for chunk in stream:
# process chunk
...
trio.run(main)
Stream to disk¶
Stream the response body directly to disk.
import trio
from antyr import HttpCrawler
async def main() -> None:
async with HttpCrawler("https://httpbin.org") as crawler:
await crawler.fetch("/image/png").save("downloads")
trio.run(main)
The target filename is derived from the response headers or URL and normalized for filesystem safety.