Amazon Elastic MapReduce, an AWS service that lets you easily process vast amounts of data over scalable EC2 instances, has previously released a slew of updates that include S3 encryption support, consistent view of EMRFS, and enhanced CloudWatch metrics.
Now, EMR has released version 4.0.0, which carries more major enhancements:
- Apache Hadoop version 2.6.0 – this includes improvement to functionality, authentication, metrics, HCFS, HDFS, and YARN.
- Hive 1.0 – enhancements to performance, SQL support, and security.
- Pig 0.14 – ORCStorage class, predicate pushdown, bug fixes, and several others.
- Spark 1.4.1 – new Dataframe API and binding for SparkR, and more.
- Ability to create clusters – Rapidly create clusters in Console via quick cluster configuration. You can find this option under the EMR Quick Create menu.
- Better Application Configuration Editing – instead of using bootstrap actions, you may use a direct method to edit configurations: by passing a configuration object containing a list of configuration files to be edited as well as the settings in the files that need changing. Read the complete instructions at the Configuration Guide.
- New Packaging System – Moved release packaging system to Apache Bigtop to allow for quicker movement of applications into EMR. Ports and paths on EMR have also been moved to open source standards.
- Extra EMR configuration options for Spark – YARN has the ability to dynamically set the number of executors for Spark applications by editing the spark-defaults configuration file. For more information, check out how to Configure Spark.