Firebase Status Dashboard
For incidents related to Cloud Functions, Cloud Firestore and Cloud Storage, please see Cloud Status Dashboard. And for incidents related to Google Analytics, please see Ads Status Dashboard.
Incident affecting Realtime Database
Firebase Realtime Database experiencing network outages in us-central-1
Incident began at 2021-04-16 15:58 and ended at 2021-04-16 16:37 (all times are US/Pacific).
| ||22 Apr 2021||11:00 PDT|| |
On 2021-04-16 15:58 US/Pacific, Firebase Realtime Database (RTDB) experienced an outage, rejecting all requests to affected databases for a duration of 40 minutes. The issue affected 34% of active databases in the us-central1 region that made up 60% of the total traffic. The initial service information about the incident was posted to the Firebase status dashboard at 16:33 and the service outage was reported on the status dashboard at 17:16.
To our Firebase customers whose businesses were impacted during this outage, we sincerely apologize – this is not the level of quality and reliability we strive to offer you, and we are taking immediate steps to improve the platform’s performance and availability. We have conducted an internal investigation and are taking steps to improve our service.
RTDB backends in us-central1 region are currently in the process of migration between two different clusters requiring a network peering. The root cause of this incident was a software update to the network control plane that resulted in a risk of configuration inconsistency between peered networks. At 14:16 a sequence of configuration updates were propagated to the network control plane elements in a RTDB network leading to this inconsistency. The trigger for this issue was a background job that increased the number of instances in the affected cluster and crossed a threshold of network nodes at 15:58, causing the traffic to be rerouted through affected network control plane elements. This in turn resulted in packet loss and service unavailability for databases that were migrated to the new cluster. At 16:38 an automated update to the routers brought back the network to a consistent state, ending the event.
REMEDIATION AND PREVENTION
The problem started to manifest at 2021-04-16 15:58 US/Pacific. Google engineers were alerted to capacity loss three minutes later and immediately started an investigation. The system started recovering due to an automated update to the network control plane bringing it back to a consistent state and by 16:38 the issue was mitigated. As soon as the issue was linked to the faulty network control plane elements, the background job was identified as a contributing factor. The engineers stopped the background job at 16:59 to reduce the risk of recurrence. The rollback of the router update was initiated at 17:47 to fully resolve the issue.
To guard against the issue recurring and to reduce the impact of similar events, we are taking the following actions:
Google is committed to quickly and continually improving our technology and operations to prevent service disruptions. We appreciate your patience and apologize again for the impact to your organization. We thank you for your business.
If you believe your paid application experienced an SLA violation as a result of this incident, please follow these instructions to request an SLA related credit: https://firebase.google.com/terms/service-level-agreement#4_credit_request_and_payment_procedures
| ||16 Apr 2021||17:26 PDT|| |
The issue with Firebase Realtime Database has been resolved for all affected users as of Friday, 2021-04-16 16:38 US/Pacific.
We will publish an analysis of this incident once we have completed our internal investigation.
We thank you for your patience while we worked on resolving the issue.
| ||16 Apr 2021||17:16 PDT|| |
We are experiencing an intermittent issue with Firebase Realtime Database with network connectivity in us-central-1 beginning at Friday, 2021-04-16 14:59 US/Pacific.
Our engineering team continues to investigate the issue.
We will provide an update by Friday, 2021-04-16 17:30 US/Pacific with current details.
We apologize to all who are affected by the disruption.
- All times are US/Pacific