Are you running on Amazon’s cloud and struggling with handling big data? Then check this out: Amazon Web Services today released the new D2 series of EC2 instances, supporting dense storage which can handle multi-terabyte data sets.
The new instances do not only provide better CPU and memory spec. These instances are geared for sustained high rate of sequential disk I/O for access to extremely large data sets, which can get up to 3,500 MB/second read and 3,100 MB/second write performance on the largest instances. These new instances also have enhanced networking and support for placement groups which boosts access to remote instances. This makes the D2 instances classic for use cases such as Hadoop clusters and their MapReduce jobs, Massive Parallel Processing data warehouse, log processing etc.
It’s important to note that the disk I/O boost is for the local ephemeral storage, which is “gone” as the EC2 compute instance is “gone”. So it is up to the user to take care of redundancy of the data as needed, whether in RAID form, or using distributed file systems such as HDFS or GlusterFS. The new instances also come EBS-optimized by default, so you can offload local data as needed to EBS (Amazon’s native block storage) volumes while leveraging dedicated high bandwidth that doesn’t impact your regular network traffic.
Amazon guys did nice work integrating advanced features of the Linux kernel and of the Intel XEON CPUs. If you need to chew through your massive data sets, you’d want to check it out.
Follow Dotan on Twitter!