Back to overview
Downtime

Service outage

Jun 11 at 02:36pm UTC
Affected services
Service
API
Dashboard
Dashboard API
REST API (API v3)
Authentication
Recache

Resolved
Jun 11 at 02:37pm UTC

Service recovered

Created
Jun 11 at 02:36pm UTC

At 06 : 45 CEST (UTC +2) on 11 June 2025 every compute node in our production cluster was powered off simultaneously by the upstream infrastructure provider. Public-facing APIs, workers and scheduled jobs were completely unavailable for 4 h 45 m. A reduced-capacity service then ran with elevated latency and intermittent 503s until 16 : 25 CEST, when the final batch of nodes was re-attached and all metrics returned to baseline.

Root Cause (still under investigation)
No definitive response has been received from the infrastructure provider at the time of writing. The working hypothesis is an unscheduled maintenance event on their side that powered off hardware which had been running continuously for ~2.5 years. Because the shutdown was ungraceful, filesystems required replay and the cluster could not auto-recover.