Amazon Web Services (AWS) was the center of a very lively but not beneficial tech event with millions of its digital services clients all over the globe. It was so bad that the hour after it started is when the lifeline of more than 1,000 companies cut off to zero. The disruption brought down ride-sharing apps, digital wallets, and popular online games, among many other things. Tracing back from a major AWS outage in the US-EAST-1 region located in Virginia, the release cluster that’s been the oldest in the company and largest in the data center group, is what engineers found. This event is the third time within five years that the Internet connection has been lost dramatically, and the reason has been pointed out to be this very place.
What caused the AWS outage?
The AWS outage that lasted for several hours was mainly due to a very unfortunate situation caused by the DNS infrastructure system, specifically the Domain Name System, which is often compared to a phone book of the Internet as it assists by converting human-readable domain names into machine-readable IP addresses. After the explosion, the “latent defect” in the DNS automation system that they were handling for the DynamoDB database service was identified as the origin of the issue by AWS.
This brought about a “race condition,” where several automated operations trying to correct the issue found themselves in conflict because they were all running simultaneously. These uncoordinated and overlapping efforts were, in fact, canceling each other out. For instance, by virtue of high system delays, a safety measure intended to prevent an old system plan from overwriting a newer one failed, and so, an outdated and incorrect DNS plan was applied thus allowing the wrong one. This DNS mishap halted a huge number of applications and services from being able to find their endpoints, thus causing a total freeze.
What prompted the global online disruption?
Two reasons were at the root of the worldwide scale of the disruption: firstly, the central role of DNS and secondly, AWS’s market dominance. A DNS error is just one of many types of Internet blockages that can happen and usually they only affect individual sites or services. As AWS is the largest cloud provider in the world, a single DNS failure in its system led to a chain of events that had a worldwide impact.
The problem was made worse by the failure of a main resiliency system, the Network Load Balancer (NLB), that directs the traffic to the servers that are working properly. Due to the heavy work, a related network health check system started to deteriorate. This was the reason the health checks were failing from time to time, even though the servers underneath were in a good state. Therefore, the NLB wrongfully redirected the traffic away from the nodes which were still working, thus, the service disruptions were intensified further for the end-users.
Which popular websites and apps have been impacted?
The failure affected various popular online services that cater to different groups. As the US-East-1 region went down, numerous platform problem reports from different services have increased, among them:
Social Media: Reddit was down for approximately 30 minutes.
Video Streaming: YouTube had a very difficult time.
Telecom & Services: Verizon and Ring services were impacted.
Other Services: Banks, airlines, and other apps relying on AWS infrastructure also faced outages.
The impact was felt in different waves, with signs of initial recovery happening during the night PT, and then a second wave of reports as the West Coast was starting the workday.
What has Amazon said about the global outage?
The company of Amazon, along with its AWS service, is not holding back from giving the publicly-downed whose details behind the curtain. They pinpointed that the root cause was a race condition in an internal subsystem that handles automation of DNS and network health checks. Communications of AWS’s status page and beyond, AWS took the steps listed to resolve the issue, such as mitigating the underlying DNS issue and limiting some functions, like new EC2 instance launches, to allow the system to recover, among others. The organization acknowledged the existence of a complete “post-event overview,” yet it warned that this elaborate report might take several weeks or months to be ready.
How long did the outage last?
We are talking here for several hours of downtime. Just after midnight PT on Monday, AWS first reported that they were investigating increased error rates. By 3:35 a.m. PT, the company announced that the root DNS issue had been “fully mitigated” and services were getting back to normal. Nevertheless, a wave of problems about the Network Load Balancer followed up later that morning as there was a sudden spike in usage. It was only until 3:53 p.m. PT that same day that Amazon made the final announcement that all issues had been resolved, and normal operations had been restored.
Why is an AWS outage such a big deal?
The reason why the whole thing with an AWS shutdown has such an enormous effect is basically because so much of the contemporary Internet is constructed on their infrastructure. It is a classic illustration of the perils of concentration when you have a million businesses vertically diversified from startups to Fortune 500 companies that hinge on one service provider. While pundits talk about the need for diversification, in reality, there are not many alternatives of the same magnitude as AWS with only Microsoft Azure and Google Cloud Platform being kinds of equal competitors.
Such a dependence has ignited discussions in the UK and Europe about building their own cloud infrastructure so that they would not be that much dependent on the US tech giants. But the high cost and the dominance of the existing players make it a hard proposition. Events like this outage underscore the systemic risk and intricacy of today’s cloud ecosystem.
Is Amazon down because of the outage?
The internal DNS and networking issues during the event also affected Amazon’s own services and the AWS management console. Nevertheless, “Amazon down” mostly means the massive blackout of third-party services and the popular websites that are dependent on the AWS infrastructure. The main problem was not that Amazon.com
was completely offline, rather that the underlying cloud platform which is used by many companies had a critical failure.
What is the history of AWS region outages?
In the recent past, AWS has gone through a number of significant outages, with the US-EAST-1 region in Northern Virginia being the primary source of most of the issues. This 2025 event, as a matter of fact, has been the third major internet outage in the last five years resulting from this particular data center cluster. This story is all about the risks of such a critical and concentrated piece of global digital infrastructure, where a single failure can potentially cause disproportionate consequences. If you want before your sites outages use website monitoring tool it helps you to find the problem bofore outages happens.

