

What happens if Railway has an outage?
Railway is a cloud deployment platform for web services, databases, and background jobs. This article covers what happens to workloads during a Railway outage, what has caused past incidents, how long they last, what teams can do to reduce operational exposure, and what alternatives exist for teams evaluating other platforms.
Railway has experienced five major incidents between November 2025 and May 2026, ranging from deployment queue failures to an approximately 8-hour full platform outage. During a Railway outage, the impact depends on which layer of the platform is affected. In the most severe cases, workloads become unreachable, databases become inaccessible, and the dashboard goes offline. Teams running production workloads on Railway should understand what each failure mode means operationally and what mitigations are available.
For teams evaluating alternatives with contractual uptime guarantees, Northflank operates at 99.99% historical uptime, guaranteed under SLAs on enterprise agreements, with support for multi-cloud deployment across AWS, GCP, Azure, Oracle, CoreWeave, and Civo, self-serve BYOC available on pay-as-you-go and enterprise plans, not restricted to Enterprise tier, bare-metal and on-premises deployments via Bring Your Own Kubernetes, GPU workloads, preview environments, and more. Get started (self-serve) or book a demo to walk through your specific setup.
The impact of a Railway outage depends on which layer of the platform is affected. Based on Railway's published postmortems, incidents have fallen into three broad failure categories.
Deployment layer failures affect new deployments while leaving running workloads and networking unaffected. The November 2025 GitHub webhook surge is a documented example: deployments stalled across all tiers for approximately two and a half hours while running services continued unaffected.
Infrastructure layer failures can degrade active workloads. The December 2025 cryptominer incident is a documented example: CPU starvation and private networking slowdowns affected a subset of workloads for approximately four hours.
Control plane failures are the most severe. The May 2026 GCP account suspension is the documented example: when the control plane went offline, edge proxies could no longer resolve routes to active instances, workloads across all regions became unreachable, and persistent disks were inaccessible, meaning database backups could not be retrieved for the duration of the incident.
Railway's February 2026 postmortem identified a pattern across incidents: tightly coupled systems with a large blast radius, where a single failure cascades into a broader outage. That pattern was present across each of the five documented incidents between November 2025 and May 2026.
The root causes varied: a webhook surge, a cryptominer exploit via a third-party vulnerability, DDoS attacks compounded by upstream fiber cuts and a Cloudflare BGP outage, a CDN configuration update that accidentally enabled caching on domains that had it disabled, and a GCP account suspension that took the network control plane offline. What connected them was the architectural consequence: a failure in one layer propagating into broader platform impact.
For the full incident breakdown and postmortem detail, see Railway app outage: where to host your projects instead.
For teams that need contractual uptime guarantees, Northflank operates at 99.99% historical uptime, guaranteed under SLAs on enterprise agreements, with support for multi-cloud deployment across AWS, GCP, Azure, Oracle, CoreWeave, and Civo, self-serve BYOC available on pay-as-you-go and enterprise plans (not restricted to Enterprise tier), bare-metal and on-premises deployments via Bring Your Own Kubernetes, GPU workloads, preview environments, and more.
Get started (self-serve) or book a demo to walk through your specific setup.
- Bring your own cloud: deploy across AWS, GCP, Azure, Oracle, CoreWeave, and Civo from your own account
- Managed cloud: deploy into Northflank's global regions with no infrastructure setup required
- Bare-metal and on-premises: import existing Kubernetes clusters or deploy to on-premises infrastructure via Bring Your Own Kubernetes
- Customer VPC deployments: deploy your product directly into your customers' own cloud environments
- GPU workloads: deploy and scale GPU-backed services across supported cloud providers
- Preview environments: automatically provision isolated environments on every pull request
Based on the five documented incidents between November 2025 and May 2026, durations have ranged from 52 minutes to approximately 8 hours for single incidents, with one multi-day intermittent disruption spanning February 18–21, 2026. The February incident comprised nine separate DDoS attacks plus a Cloudflare BGP outage, each with individual customer impact windows ranging from 30 seconds to 48 minutes.
The severity and duration depend on the failure type. Deployment queue failures resolved within a few hours. The control plane failure in May 2026 lasted approximately 8 hours and required Google Cloud to reinstate the suspended account before Railway could restore service.
Yes, based on documented incidents. During the May 2026 outage, all customer-accessible databases went offline when the GCP account suspension took Railway's control plane down. Persistent disks were also inaccessible, meaning database backups could not be retrieved for the duration of the incident.
Railway provisions managed databases including PostgreSQL, MySQL, Redis, and MongoDB. The May 2026 incident showed that in a control plane failure, databases are subject to the same platform-wide impact as other workloads.
Based on Railway's documented incident history, the following measures reduce operational exposure during an outage:
- Maintain off-platform database backups. The May 2026 incident demonstrated that Railway's built-in backup access depends on the dashboard and API being online. Streaming database backups to an external provider such as AWS S3 or GCS ensures access is not dependent on Railway's availability.
- Set up independent status monitoring. Configure uptime monitoring via an external service that checks your endpoints independently of Railway's infrastructure. Do not rely solely on Railway's status page to detect incidents.
- Use Railway's multi-region replicas for stateless services. Railway supports deploying replicas across multiple regions. For stateless services, this distributes traffic across regions, reducing exposure to single-region incidents. It does not protect against control plane failures.
- Configure log drains to an external provider. Streaming logs to an external provider such as Datadog or Axiom means log data remains accessible even when Railway's dashboard is offline.
- Review contractual SLA requirements. Railway publishes availability targets on paid plans but contractual SLAs with service credits require Business Class or Enterprise. Teams with contractual uptime obligations should verify their plan covers those requirements.
For teams where these mitigations are insufficient, particularly around contractual uptime, control plane dependency, or data residency, evaluating a platform with a different architectural model like Northflank may be the more practical path.
Railway publishes availability targets on paid plans (99.9% Hobby, 99.99% Pro, 99.999% Enterprise) and has published postmortems for five major incidents between November 2025 and May 2026. Contractual SLAs with service credits require Business Class or Enterprise. The Pro plan explicitly excludes SLOs per Railway's support documentation.
For a detailed assessment of Railway's production readiness, plan structure, and feature limitations, see Is Railway good for production workloads?
For teams evaluating platforms with contractual uptime guarantees, multi-cloud deployment, bare-metal and on-premises support, or BYOC (Bring Your Own Cloud) support to reduce single-provider risk, Northflank provides a control plane for deploying services, workers, databases, and GPU workloads on Kubernetes infrastructure, with support for simultaneous multi-cloud deployments across AWS, GCP, Azure, Oracle, CoreWeave, and Civo, and bare-metal and on-premises deployments via Bring Your Own Kubernetes.
Distributing workloads across multiple independent cloud providers reduces the risk of a single provider action taking all workloads offline simultaneously. With Northflank's Bring Your Own Cloud model, workloads and data stay inside your own cloud account. Northflank also operates its own managed cloud for teams that prefer a fully managed experience.
For a broader comparison of platforms, see 6 best Railway alternatives in 2026.
For teams evaluating alternatives with contractual uptime guarantees, Northflank operates at 99.99% historical uptime, guaranteed under SLAs on enterprise agreements, with support for multi-cloud deployment across AWS, GCP, Azure, Oracle, CoreWeave, and Civo, self-serve BYOC available on pay-as-you-go and enterprise plans (not restricted to Enterprise tier), bare-metal and on-premises deployments via Bring Your Own Kubernetes, GPU workloads, preview environments, and more.
Get started (self-serve) or book a demo to walk through your specific setup.
- Bring your own cloud: deploy across AWS, GCP, Azure, Oracle, CoreWeave, and Civo from your own account
- Managed cloud: deploy into Northflank's global regions with no infrastructure setup required
- Bare-metal and on-premises: import existing Kubernetes clusters or deploy to on-premises infrastructure via Bring Your Own Kubernetes
- Customer VPC deployments: deploy your product directly into your customers' own cloud environments
- GPU workloads: deploy and scale GPU-backed services across supported cloud providers
- Preview environments: automatically provision isolated environments on every pull request
Based on documented incidents, the impact depends on which platform layer is affected. In documented deployment layer failures, new deploys stall while running services continue. In documented infrastructure layer failures, a subset of active workloads are degraded. The most severe documented failure was the May 2026 control plane outage, where all workloads across all regions became unreachable once routing caches expired.
Check status.railway.com for live status. Railway publishes real-time incident updates and history on that page.
Railway's documented incident causes between November 2025 and May 2026 include a webhook-driven deployment queue failure, a cryptominer exploit causing CPU starvation across a subset of workloads, DDoS attacks compounded by upstream fiber cuts and a Cloudflare BGP outage, a CDN configuration update that accidentally cached authenticated responses, and a GCP account suspension that took the network control plane offline for approximately 8 hours.
Documented Railway outages between November 2025 and May 2026 ranged from 52 minutes to approximately 8 hours for single incidents, with one multi-day intermittent disruption comprising nine separate attack waves across February 18–21, 2026.
Yes. During the May 2026 outage, all customer-accessible databases went offline and persistent disks were inaccessible for the duration of the incident, meaning database backups could not be retrieved. Maintaining off-platform backups reduces exposure to this risk.
Key mitigations include maintaining off-platform database backups, configuring external uptime monitoring, streaming logs to an external provider, and reviewing whether your plan tier includes contractual SLAs. For teams where these mitigations are insufficient, a platform like Northflank provides multi-cloud deployment, contractual uptime SLAs, and BYOC across plans.
Northflank provides 99.99% historical uptime with contractual SLAs on enterprise agreements, BYOC across plans, multi-cloud deployment, and managed Kubernetes. For a full comparison, see 6 best Railway alternatives in 2026.