By Staff Writer
A six-hour outage left more than 3.5 billion Facebook, Messenger, Oculus, Instagram and Whatsapp users offline early on Tuesday (Australia time). Despite early fears of a cyberattack, parent company Facebook soon confirmed the cause was closer to home.
The outage, which occurred during the Monday business day in the United States and early evening in the UK and Europe, was Facebook’s most serious disruption since March 2019, when a 14-hour outage cut off users from Facebook’s various social media platforms.
Web monitoring company DownDetector said this week’s outage was the biggest it had seen.
On Tuesday, Facebook quickly denied systems had been compromised, instead saying a “routine maintenance error” as the cause of the outage.
The outage was specific to users of Facebook’s many social media apps, including Instagram, WhatsApp, Messenger and Oculus apps, which began displaying error messages.
There was a cascade effect because many people use Facebook to sign into other apps and services. Consequently, many users could not log into retail websites or sign into their smart TVs, air conditioning systems, and other internet-connected devices.
“This outage was triggered by the system that manages our global backbone network capacity,” said vice-president of engineering at Facebook, Santosh Janardhan.
Facebook’s data centres span the globe and are connected by fibre-optic cables. Janardhan notes the date centres come in all shapes and sizes, from big buildings that handle the grunt work of big computational loads to smaller centres that connect critical Facebook infrastructure to the broader internet and users.
When a Facebook user opens the platform to upload data, be it a message or multimedia, their device generally accesses the closest data centre. That data centre then communicates over what Facebook calls its “backbone network” to a larger data centre where the user’s request is processed.
On Tuesday, Facebook inadvertently broke its backbone network.
While routine maintenance work was underway at a data centre, technicians issued a command intending to assess the availability of global backbone capacity. This unintentionally took down all the connections in Facebook’s backbone network, disconnecting data centres worldwide. Sources outside Facebook identify the outage as stemming from a data centre in Santa Clara, California.
With the backbone network down, DNS servers went offline, complicating the problem. Internal systems at Facebook also went offline, adding further complexity.
“Our systems are designed to audit commands like these to prevent mistakes like this, but a bug in that audit tool prevented it from properly stopping the command,” said Janardhan
“Today’s outage brought our reliance on Facebook, and its properties like WhatsApp and Instagram, into sharp relief,” Professor of Communications at Cornell University, Brooke Duffy, told The New York Times.
“The abruptness of today’s outage highlights the staggering level of precarity that structures our increasingly digitally-mediated work economy.”
Six hours after the outage began, Facebook had its systems up and running again. CEO Mark Zuckerberg subsequently apologised to billions of his app’s users. Santosh Janardhan was sanguine, calling the outage “an opportunity to learn and get better.”