Amazon’s ecommerce business has summoned a large group of engineers to a meeting on Tuesday for a “deep dive” into a spate of outages, including incidents tied to the use of AI coding tools.
The online retail giant said there had been a “trend of incidents” in recent months, characterised by a “high blast radius” and “Gen-AI assisted changes” among other factors, according to a briefing note for the meeting seen by the FT.
Under “contributing factors” the note included “novel GenAI usage for which best practices and safeguards are not yet fully established”.
“Folks, as you likely know, the availability of the site and related infrastructure has not been good recently,” Dave Treadwell, a senior vice-president at the group, told employees in an email, also seen by the FT.
The note ahead of Tuesday’s meeting did not specify which particular incidents the group planned to discuss.
Amazon’s website and shopping app went down for nearly six hours this month in an incident the company said involved an erroneous “software code deployment”. The outage left customers unable to complete transactions or access functions such as checking account details and product prices.
Treadwell, a former Microsoft engineering executive, told employees that Amazon would focus its weekly “This Week in Stores Tech” (TWiST) meeting on a “deep dive into some of the issues that got us here as well as some short immediate term initiatives” the group hopes will limit future outages.
He asked staff to attend the meeting, which is normally optional.
Junior and mid-level engineers will now require more senior engineers to sign off any AI-assisted changes, Treadwell added.
Amazon said the review of website availability was “part of normal business” and it aims for continual improvement.
“TWiST is our regular weekly operations meeting with a specific group of retail technology leaders and teams where we review operational performance across our store,” the company said.
Separately, the company’s cloud computing arm — Amazon Web Services — has suffered at least two incidents linked to the use of AI coding assistants, which the company has been actively rolling out to its staff.
AWS suffered a 13-hour interruption to a cost calculator used by customers in mid-December after engineers allowed the group’s Kiro AI coding tool to make certain changes, and the AI tool opted to “delete and recreate the environment”, the FT previously reported.
Amazon previously said the incident in December was an “extremely limited event” affecting only a single service in parts of mainland China. Amazon added that the second incident did not have an impact on a “customer facing AWS service”.
The FT previously reported multiple Amazon engineers said their business units had to deal with a higher number of “Sev2s” — incidents requiring a rapid response to avoid product outages — each day as a result of job cuts.
Amazon has undertaken multiple rounds of lay-offs in recent years, most recently eliminating 16,000 corporate roles in January. The group has disputed the claim that headcount cuts were responsible for an increase in recent outages.
Gonna see a lot more of this in the coming years. The real cost of LLM tools has a delay. Devs don't tend to notice it until they're neck deep in code then don't understand, swearing the next prompt will get them out. CEOs won't notice until it starts costing them money, and that of course assumes anyone will be willing to admit it. Lot of people have their careers on the line spending a metric shit ton of money on untested tools.
didn't realize this post got traction, it seems like it was HN pooled, I came across this article and related topics after trying to search what would be rigorous and closest to the phenomenon of the unreasonable effectiveness of mathematics by wigner, renormalization groups were the closest that I came across, the reason why the post title doesn't match the story title is likely due to the story being switched to a more detailed article I considered posting, the title is from a quanta video covering universality, linked below
> McCarthy looked at those and he said every house is going to
have one of these. And he didn't worry about personal computing
or anything because what he thought—what the thing that occurred
to him—this is like getting your power from the outside, this is
like getting your water from the outside, it's going to be a
utility. It'll be a universal utility, be like the telephone.
Everybody will have one of these things and there will be national
Computing centers that everybody can tap into. And he started
thinking about it and he realized: oh, the problem is nobody can
deal with computers on the computer's terms. We need a common
sense way of dealing with computation. And so he proposed that
there be an agent called the advice taker that you could have
back and forth with, and not just ask it questions but tell it
things and have it reason. And there's a whole bunch of
interesting stuff in there.
Amazon’s ecommerce business has summoned a large group of engineers to a meeting on Tuesday for a “deep dive” into a spate of outages, including incidents tied to the use of AI coding tools.
The online retail giant said there had been a “trend of incidents” in recent months, characterised by a “high blast radius” and “Gen-AI assisted changes” among other factors, according to a briefing note for the meeting seen by the FT.
Under “contributing factors” the note included “novel GenAI usage for which best practices and safeguards are not yet fully established”.
“Folks, as you likely know, the availability of the site and related infrastructure has not been good recently,” Dave Treadwell, a senior vice-president at the group, told employees in an email, also seen by the FT.
The note ahead of Tuesday’s meeting did not specify which particular incidents the group planned to discuss.
Amazon’s website and shopping app went down for nearly six hours this month in an incident the company said involved an erroneous “software code deployment”. The outage left customers unable to complete transactions or access functions such as checking account details and product prices.
Treadwell, a former Microsoft engineering executive, told employees that Amazon would focus its weekly “This Week in Stores Tech” (TWiST) meeting on a “deep dive into some of the issues that got us here as well as some short immediate term initiatives” the group hopes will limit future outages.
He asked staff to attend the meeting, which is normally optional.
Junior and mid-level engineers will now require more senior engineers to sign off any AI-assisted changes, Treadwell added.
Amazon said the review of website availability was “part of normal business” and it aims for continual improvement.
“TWiST is our regular weekly operations meeting with a specific group of retail technology leaders and teams where we review operational performance across our store,” the company said.
Separately, the company’s cloud computing arm — Amazon Web Services — has suffered at least two incidents linked to the use of AI coding assistants, which the company has been actively rolling out to its staff.
AWS suffered a 13-hour interruption to a cost calculator used by customers in mid-December after engineers allowed the group’s Kiro AI coding tool to make certain changes, and the AI tool opted to “delete and recreate the environment”, the FT previously reported.
Amazon previously said the incident in December was an “extremely limited event” affecting only a single service in parts of mainland China. Amazon added that the second incident did not have an impact on a “customer facing AWS service”.
The FT previously reported multiple Amazon engineers said their business units had to deal with a higher number of “Sev2s” — incidents requiring a rapid response to avoid product outages — each day as a result of job cuts.
Amazon has undertaken multiple rounds of lay-offs in recent years, most recently eliminating 16,000 corporate roles in January. The group has disputed the claim that headcount cuts were responsible for an increase in recent outages.
reply