As a software engineer, I spent my career making Internet slightly better place for millions of users, touching their daily lives and making positive impression some way or the other. Writing protocol test suite for Internet backbone, building services for millions of subscribers, processing web content for improved search experience and lately expanding product selection that people can buy or sale on Internet, the journey took me through different phases of digital revolution. The Cloud Computing, Web Search Technology, Hadoop, and NoSQL are some of those which pushed the envelope of digital technology in last two decades. As the industry making steady progress on multiple fronts, the fast and powerful silicon chips achieved technological breakthrough in the field of Artificial Intelligence (AI). Thanks to the rapid innovation in physics (electronics in particular), modern machine can read and speak, play superior chess and arguably drive a car better than us. With machines claiming more and more cognitive load from us, we started giving them even more complex tasks to accomplish. Silicon became a major driving force of such a virtuous cycle of AI in recent time. The NVIDIA A100 Tensor Core GPU, Google Cloud TPU, Amazon Inferentia and Graviton2 carry the testimony of technology disruption the digital world is going through. In this article, I made an attempt to compare the Amazon Graviton2 powered EC2 with existing compute class instance offerings by AWS.
What is Graviton2
Graviton is custom AWS server-grade chip powered by Arm processor. In 2018 re:Invent, Amazon took a step towards making computation faster and cheaper for AWS customers by offering Graviton powered Elastic Compute Cloud (EC2) in addition to a variety of Intel and AMD processor based EC2. Launching its own silicon for cloud-native workload gave Amazon the ability to rapidly innovate, build and iterate on behalf of customers. Graviton2, the next generation chip with Arm Neoverse 64-core processor was launched in 2019 re:Invent, which outperformed older generation Graviton by an impressive margin. Faster than closest x86 family processors, Graviton2 claimed to provide up to 40% price-performance benefit to the customers.
Benchmarking Graviton2
Ever since its launch, Graviton2 was benchmarked against different type of cloud-native workloads like HTTPS load balancing with Ngnix, Memcached, X264 video encoding etc. AWS published performance benchmark in its announcement blog. Similar benchmarking performed by AWS partners like Treasure Data, independent reviewer like AnandTech and OSS community like KeyDB. The consistent performance boost across the board along with better price point was my motivation to do the same for our workload. We classify billions of products sold online using ML, Search and Similarity measures, so our data processing pipeline is always CPU bound. Thanks to AWS EMR for launching Graviton2 support in Q4 2020, Spark on EMR is my choice of distributed data processing engine. For this experiment, we chose a fairly simple CPU bound (java regex-based matching) workload on 2 TB data on EMR cluster with 10 EC2 instances. I used AWS S3 as the IO storage of data.
I built the code using JDK8 in Amazon Linux 2 platform running on x86(Intel Xeon) and aarch64(Graviton2) desktops. In all my runs, everything else kept identical except the EC2 instance type for obvious reason. I also chose closest compute class EC2 instances like C4(current generation Intel x86 instance), C5(new generation Intel x86 instance), M5a(new generation AMD x86 instance) and C6g(new generation Graviton2 aarch64 instance) for my runs. More details here –
AWS EMR Version: 5.31
Apache Spark Version: 2.4
Build OS: Amazon Linux 2
Java Version – OpenJDK 8(Amazon Corretto)
EMR Cluster Topology: 1 master & 10 core instances
Input Data Size: 2 TB
Output Data Size: 350 MB
EC2 variants used: c4.8xlarge, c5.9xlarge, m5a.8xlarge & c6g.8xlarge
We got little over 4% speedup on Graviton2(C6g) EC2 compared to newer generation compute class(C5) EC2, even with less vCPU (32 vs 36). One possible reason is each vCPU in Graviron2 is a physical core, whereas Intel Xeon powered C5 runs with 16 physical cores with SMT. The biggest benefit of C6g powered EC2 comes from its price point, 29% cheaper than C5, giving us 32% price-performance gain. The other variants like M5a or C4 performed significantly slower than the C5 or C6g, making them poor choice for CPU bound workload.