Wayback Cache Proxy
A Redis-backed caching proxy for the Wayback Machine, ensuring reliable offline access to archived web content for museum exhibitions.
Role: Developer
About the Project
Wayback Cache Proxy is a caching HTTP proxy for the Internet Archive’s Wayback Machine, developed for ZKM’s Choose Your Filter! browser art exhibition. When the Internet Archive went offline for weeks in late 2024 due to DDoS attacks and a data breach, it became clear that an exhibition depending on real-time Wayback Machine access needed a local fallback. The proxy fetches archived pages once, caches them in Redis, and serves them independently — ensuring the exhibition runs reliably regardless of the Internet Archive’s availability.
Features
- Two-tier Redis cache — a permanent curated tier for verified exhibition content and an auto-expiring hot tier for visitor-discovered pages
- Prefetch crawler — async spider that pre-populates the cache from seed URLs before exhibitions open
- Modem speed throttling — period-accurate simulation of 14.4k, 28.8k, 56k, ISDN, and DSL connections, selectable by visitors
- Content transformation — removes Wayback Machine toolbar, fixes asset URLs, and strips injected scripts for clean rendering
- Admin interfaces — FastAPI dashboard for remote management and an embedded IE4-compatible interface for on-site use
- Live configuration reload — settings changes via admin UI take effect without restarting the proxy
- URL allowlisting — restricts browsable domains for curated exhibition experiences
Open Source
Released under MIT license on ZKM’s GitHub. While built for museum use, the architecture supports any scenario requiring reliable, offline-capable access to archived web content.