October 20, 2025
My smart home stopped being smart. Alexa was silent as a tomb. Ring doorbell showed connecting… indefinitely. Snapchat, Fortnite, and other working services turned into screens displaying: Warning - Technical Maintenance.
It wasn’t a cyberattack. It wasn’t a hardware failure.
It was three letters: DNS
.
What Actually Happened
The timeline of events from AWS’s official reports looked like this:
23:49 PDT ─┐
│ DNS resolution error for DynamoDB endpoints
├─> DynamoDB API errors (US-EAST-1)
│
02:24 PDT ─┤ DNS FIXED! 🎉
│ ...but...
│
├─> EC2 launch system failed (DynamoDB dependency)
│ └─> RDS, ECS, Glue - everything using EC2
│
09:38 PDT ─┤ Network Load Balancer health checks failure
│ └─> Lambda execution environments
│ └─> CloudWatch metrics
│ └─> Total connection chaos
│
15:01 PDT ─┴─> FULL RESTORATION (15h 12min later)
Affected services:
├─ Gaming: Fortnite, Roblox, PlayStation Network
├─ Social: Snapchat, Signal
├─ Finance: Coinbase, Robinhood, Venmo, US banks
├─ Amazon's own: Alexa, Ring, Prime Video
├─ Atlassian: Jira, Confluence
└─ Infrastructure: IAM, Support tickets, DynamoDB Global Tables
DNS: The 12 Bytes That Stopped Half the Internet
What exactly is DNS
and why did it cause such massive problems and losses?
DNS
is essentially the internet’s phone book, allowing us to communicate freely with the internet.
Let’s imagine we don’t use DNS
- we want to search something on Google. Instead of google.com, we’d have to type an IP address, like 142.250.179.142
.
Could you remember a string of numbers instead of google.com? Probably not!
DNS
is our phone book, allowing us to communicate using easy-to-remember web addresses.
How Does It Work?
Code is worth more than 1000 words. Using the C
language and this very simplified program, let’s query DNS
server for DynomDb IP!
(or you can use ready-made programs such as host
lub nslookup
)
#include <stdio.h>
#include <string.h>
#include <sys/socket.h>
#include <netinet/in.h>
#include <arpa/inet.h>
int main() {
int sock = socket(AF_INET, SOCK_DGRAM, 0);
struct sockaddr_in dns = {AF_INET, htons(53), {inet_addr("8.8.8.8")}};
// This is the ENTIRE `DNS` query - just 12 bytes header + domain name
unsigned char q[] = {
0xAA,0xAA,0x01,0x00,0,1,0,0,0,0,0,0, // `DNS` Header
8,'d','y','n','a','m','o','d','b',
9,'u','s','-','e','a','s','t','-','1',
9,'a','m','a','z','o','n','a','w','s',
3,'c','o','m',0,
0,1,0,1 // Type A, Class IN
};
unsigned char r[512];
sendto(sock, q, sizeof(q), 0, (struct sockaddr*)&dns, sizeof(dns));
int len = recvfrom(sock, r, 512, 0, NULL, NULL);
// `DNS` response has a complex structure, but IP is at the end
printf("IP: %d.%d.%d.%d\n", r[len-4], r[len-3], r[len-2], r[len-1]);
return 0;
}
And the response looks like this:
IP: 3.218.180.124
53
is the DNS port. 8.8.8.8
is Google’s public DNS server (you can use any DNS server). That’s all you need to ask where is dynamodb.us-east-1.amazonaws.com
.
DNS
servers use UDP
protocol, which is much simpler than TCP/IP
and therefore very fast.
Of course, like any protocol, it has standardization marked as RFC 1035
.
The outage itself affected DynamoDB - a NoSQL database used for storing and processing large amounts of data, especially at high scale and traffic intensity.
Even those who don’t use DynamoDB were affected by the outage because, for example, EC2 internally uses DynamoDB to store metadata.
DynamoDB as Distributed Architecture
DynamoDB, like other AWS services, operates in distributed architecture, which we can illustrate with the following diagram:
🏗️ DynamoDB ARCHITECTURE (distributed)
LOAD BALANCER VIP
(Virtual IP: 52.94.133.131)
↓
┌───────────────────┼───────────────────┐
│ │ │
┌───▼────┐ ┌───▼────┐ ┌───▼────┐
│ Node 1 │ │ Node 2 │ │ Node 3 │
│ AZ-1a │ │ AZ-1b │ │ AZ-1c │
│ IP │ │ IP │ │ IP │
└────────┘ └────────┘ └────────┘
✅ ✅ ✅
WORKING WORKING WORKING
Since everything is distributed, even DNS
servers, what happened?
The entire DNS
system for us-east-1
had problems.
As it turns out, “multi-region” requires true independence, not just data replication.
How does this relate to our DNS
problem?
Many AWS services, despite being located in different places, had one point of contact - in this case, DNS
address resolution in us-east-1.
// Entire infrastructure dependent on a single service
DynamoDB_IP = dns_resolve("dynamodb.us-east-1.amazonaws.com");
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
When this fails - everything fails!
What Did This Mean in Practice?
The problem wasn’t in DynamoDB. The problem wasn’t in the servers. The problem was in the MAP.
All AWS services were asking:
Where is dynamodb.us-east-1.amazonaws.com?
And nobody could answer. Servers were running. Data was safe. But NOBODY knew how to get there.
What Does This Mean for Us and What Lessons Can We Learn?
- The cloud isn’t magic, just someone else’s servers that can fail
- Multi-region doesn’t mean independence (something can always use a single location we don’t know about)
- Test failures, especially network failures like
DNS
- Don’t rely on a single
DNS
provider