Facebook says 'faulty configuration change' to blame for 6-hour outage

AppleInsider may earn an affiliate commission on purchases made through links on our site.

Facebook late Monday apologized for a six-hour outage that impacted the company's flagship social network, as well as ancillary services, blaming the downtime on a "faulty configuration change."

Facebook and its related services, including Instagram, WhatsApp, Messenger and Oculus VR, went offline at around 11:30 a.m. Eastern and remained inaccessible for about six hours. Subsequent reports suggested that a bad Border Gateway Protocol (BGP) update was to blame for the outage, and a new statement from Facebook seemingly confirms the theory.

In a blog post, Facebook VP of Engineering and Infrastructure Santosh Janardhan apologized for the "inconvenience" and explained that router configuration changes caused an interruption between its data centers.

"Our engineering teams have learned that configuration changes on the backbone routers that coordinate network traffic between our data centers caused issues that interrupted this communication," Janardhan said. "This disruption to network traffic had a cascading effect on the way our data centers communicate, bringing our services to a halt."

The explanation jibes with information provided by Cloudflare, which earlier in the day traced the issue back to a BGP mishap that impacted traffic routing. At the time, some speculated that a simple DNS configuration error was behind the downtime, though that explanation was abandoned after certain DNS services were found to be functional but unresponsive.

Janardhan also confirmed reports that Facebook's internal tools were impacted by the outage, complicating efforts to diagnose and solve the problem. According the The New York Times, security engineers were unable to gain physical access to affected servers because their digital badges were rendered inoperable.

Apparently fearful of rumors that its system was hacked, Facebook in the blog post reiterates that the outage was caused by a "faulty configuration change" and notes that no user data was compromised as a result of the downtime.