Realtime Database Incident #18036
Issue with Realtime Database: REST and Functions are timing out for some projects
Incident began at 2018-06-26 02:10 and ended at 2018-06-26 04:58 (all times are US/Pacific).
|Jul 06, 2018||12:00||
Starting at 2:10 AM PDT on Tuesday 26 June, 2018, some developers using the Firebase Realtime Database experienced timeouts for about 3 hours, until 4:58 AM PDT of the same day.
DETAILED DESCRIPTION OF IMPACT:
REST API and Admin SDK requests to Realtime Database instances hosted on NSS-212, 230, and 516 timed out due to an internal networking bug that affected authorization checks for access tokens used by service accounts and project owners. We estimate that 5% of projects were affected by this bug.
The networking bug caused an exhaustion of socket connections to the internal authorization service that validates access tokens. Eventually, there were no open sockets left to make requests, and developers began to experience timeouts. The issue was identified by external developers that reached out to support, who then paged the Realtime Database oncall.
REMEDIATION AND PREVENTION:
The initial, short-term fix was to drain and restart the servers to reset their connections. This unblocked the authentication requests and stopped the timeouts.
To prevent this from happening in the future, Realtime Database engineers deployed a fix for the networking issue on Wednesday 27 June, 2018 and added alerts to notify the Realtime Database oncall of a similar issue in order to minimize the time in which external developers experience timeouts.
|Jun 26, 2018||05:00||
We experienced an issue with Realtime Database where REST and Functions had timeouts for some projects. The issue started at 2:10 AM US/Pacific and has been resolved for all affected users as of 4:58 AM US/Pacific. We will conduct an internal investigation of this issue and make appropriate improvements to our systems to prevent or minimize future recurrence. Apologies for the inconvenience.