Amazon reveals new chips for AI model training and operation.
The demand for generative AI, often trained and run on GPUs, has led to a shortage of GPUs. Reports indicate that Nvidia's top-performing chips will not be available until 2024 due to being sold out.
The rising demand for generative AI, which typically depends on GPU power, has resulted in a GPU shortage, with Nvidia's top chips being sold out until 2024. TSMC's CEO even indicated that this shortage might continue into 2025. In response, leading tech companies are developing custom chips for AI model creation and deployment. Amazon has recently unveiled its new generation of such chips at its re:Invent conference.
AWS Trainium2, the first of these new chips, promises to offer four times the performance and double the energy efficiency of its predecessor, Trainium, introduced in 2020. Available on AWS in clusters of 16 chips, Trainium2 can scale up to 100,000 chips in AWS’ EC2 UltraCluster, delivering 65 exaflops of compute power. This means that a single Trainium2 chip could potentially deliver around 650 teraflops, significantly surpassing Google's custom AI training chips from 2017.
According to Amazon, a cluster of 100,000 Trainium chips can accelerate the training of a 300-billion parameter AI language model from months to weeks. Trainium2 instances will be available to AWS customers in the coming year.
Amazon also introduced Graviton4, an Arm-based chip for inferencing tasks. As the latest in the Graviton series, Graviton4 offers up to 30% better compute performance, 50% more cores, and 75% more memory bandwidth compared to Graviton3. It also features fully encrypted physical hardware interfaces, enhancing the security of AI training workloads and data. Graviton4 will power Amazon EC2 R8g instances, currently available in preview with a full launch expected soon.
These developments from Amazon, focusing on specialized chips for AI tasks, highlight the industry's shift towards custom hardware solutions to address the growing demands and complexities of AI and machine learning workloads.