Firebase Status Dashboard

Firebase Status Dashboard
Incidents
Firebase Realtime Database experiencing network outages in us-central-1

This page provides status information on the services that are part of Firebase. Check back here to view the current status of the services listed below. If you are experiencing an issue not listed here, please contact Support. Learn more about what's posted on the dashboard in this FAQ. For additional information on these services, please visit https://firebase.google.com/.

NOTE

For incidents related to Cloud Functions, Cloud Firestore and Cloud Storage, please see Cloud Service Health and the Service Health Dashboard in the Cloud Console. For incidents related to Google Analytics, please see Ads Status Dashboard.

Incident affecting Realtime Database

Firebase Realtime Database experiencing network outages in us-central-1

Incident began at 2021-04-16 15:58 and ended at 2021-04-16 16:37 (all times are US/Pacific).

Date	Time	Description
22 Apr 2021	11:00 PDT	ISSUE SUMMARY On 2021-04-16 15:58 US/Pacific, Firebase Realtime Database (RTDB) experienced an outage, rejecting all requests to affected databases for a duration of 40 minutes. The issue affected 34% of active databases in the us-central1 region that made up 60% of the total traffic. The initial service information about the incident was posted to the Firebase status dashboard at 16:33 and the service outage was reported on the status dashboard at 17:16. To our Firebase customers whose businesses were impacted during this outage, we sincerely apologize – this is not the level of quality and reliability we strive to offer you, and we are taking immediate steps to improve the platform’s performance and availability. We have conducted an internal investigation and are taking steps to improve our service. ROOT CAUSE RTDB backends in us-central1 region are currently in the process of migration between two different clusters requiring a network peering. The root cause of this incident was a software update to the network control plane that resulted in a risk of configuration inconsistency between peered networks. At 14:16 a sequence of configuration updates were propagated to the network control plane elements in a RTDB network leading to this inconsistency. The trigger for this issue was a background job that increased the number of instances in the affected cluster and crossed a threshold of network nodes at 15:58, causing the traffic to be rerouted through affected network control plane elements. This in turn resulted in packet loss and service unavailability for databases that were migrated to the new cluster. At 16:38 an automated update to the routers brought back the network to a consistent state, ending the event. REMEDIATION AND PREVENTION The problem started to manifest at 2021-04-16 15:58 US/Pacific. Google engineers were alerted to capacity loss three minutes later and immediately started an investigation. The system started recovering due to an automated update to the network control plane bringing it back to a consistent state and by 16:38 the issue was mitigated. As soon as the issue was linked to the faulty network control plane elements, the background job was identified as a contributing factor. The engineers stopped the background job at 16:59 to reduce the risk of recurrence. The rollback of the router update was initiated at 17:47 to fully resolve the issue. To guard against the issue recurring and to reduce the impact of similar events, we are taking the following actions: Google engineering teams will review and extend test coverage for their networking code to prevent a similar class of issues in the future. Firebase will expedite the backend migration to the new cluster and avoid the need for additional network peering. We will improve early customer alerting mechanisms to improve response time. Google is committed to quickly and continually improving our technology and operations to prevent service disruptions. We appreciate your patience and apologize again for the impact to your organization. We thank you for your business. SLA CREDITS If you believe your paid application experienced an SLA violation as a result of this incident, please follow these instructions to request an SLA related credit: https://firebase.google.com/terms/service-level-agreement#4_credit_request_and_payment_procedures
16 Apr 2021	17:26 PDT	The issue with Firebase Realtime Database has been resolved for all affected users as of Friday, 2021-04-16 16:38 US/Pacific. We will publish an analysis of this incident once we have completed our internal investigation. We thank you for your patience while we worked on resolving the issue.
16 Apr 2021	17:16 PDT	We are experiencing an intermittent issue with Firebase Realtime Database with network connectivity in us-central-1 beginning at Friday, 2021-04-16 14:59 US/Pacific. Our engineering team continues to investigate the issue. We will provide an update by Friday, 2021-04-16 17:30 US/Pacific with current details. We apologize to all who are affected by the disruption.

Date

Time

Description

22 Apr 2021

11:00 PDT

ISSUE SUMMARY

On 2021-04-16 15:58 US/Pacific, Firebase Realtime Database (RTDB) experienced an outage, rejecting all requests to affected databases for a duration of 40 minutes. The issue affected 34% of active databases in the us-central1 region that made up 60% of the total traffic. The initial service information about the incident was posted to the Firebase status dashboard at 16:33 and the service outage was reported on the status dashboard at 17:16.

To our Firebase customers whose businesses were impacted during this outage, we sincerely apologize – this is not the level of quality and reliability we strive to offer you, and we are taking immediate steps to improve the platform’s performance and availability. We have conducted an internal investigation and are taking steps to improve our service.

ROOT CAUSE

RTDB backends in us-central1 region are currently in the process of migration between two different clusters requiring a network peering. The root cause of this incident was a software update to the network control plane that resulted in a risk of configuration inconsistency between peered networks. At 14:16 a sequence of configuration updates were propagated to the network control plane elements in a RTDB network leading to this inconsistency. The trigger for this issue was a background job that increased the number of instances in the affected cluster and crossed a threshold of network nodes at 15:58, causing the traffic to be rerouted through affected network control plane elements. This in turn resulted in packet loss and service unavailability for databases that were migrated to the new cluster. At 16:38 an automated update to the routers brought back the network to a consistent state, ending the event.

REMEDIATION AND PREVENTION

The problem started to manifest at 2021-04-16 15:58 US/Pacific. Google engineers were alerted to capacity loss three minutes later and immediately started an investigation. The system started recovering due to an automated update to the network control plane bringing it back to a consistent state and by 16:38 the issue was mitigated. As soon as the issue was linked to the faulty network control plane elements, the background job was identified as a contributing factor. The engineers stopped the background job at 16:59 to reduce the risk of recurrence. The rollback of the router update was initiated at 17:47 to fully resolve the issue.

To guard against the issue recurring and to reduce the impact of similar events, we are taking the following actions:

Google engineering teams will review and extend test coverage for their networking code to prevent a similar class of issues in the future.
Firebase will expedite the backend migration to the new cluster and avoid the need for additional network peering.
We will improve early customer alerting mechanisms to improve response time.

Google is committed to quickly and continually improving our technology and operations to prevent service disruptions. We appreciate your patience and apologize again for the impact to your organization. We thank you for your business.

SLA CREDITS

If you believe your paid application experienced an SLA violation as a result of this incident, please follow these instructions to request an SLA related credit: https://firebase.google.com/terms/service-level-agreement#4_credit_request_and_payment_procedures

16 Apr 2021

17:26 PDT

The issue with Firebase Realtime Database has been resolved for all affected users as of Friday, 2021-04-16 16:38 US/Pacific.

We will publish an analysis of this incident once we have completed our internal investigation.

We thank you for your patience while we worked on resolving the issue.

16 Apr 2021

17:16 PDT

We are experiencing an intermittent issue with Firebase Realtime Database with network connectivity in us-central-1 beginning at Friday, 2021-04-16 14:59 US/Pacific.

Our engineering team continues to investigate the issue.

We will provide an update by Friday, 2021-04-16 17:30 US/Pacific with current details.

We apologize to all who are affected by the disruption.

All times are US/Pacific