Amazon Elastic MapReduce, an AWS service that lets you easily process vast amounts of data over scalable EC2 instances, has previously released a slew of updates that include S3 encryption support, consistent view of EMRFS, and enhanced CloudWatch metrics.
Now, EMR has released version 4.0.0, which carries more major enhancements:
- Apache Hadoop version 2.6.0 – this includes improvement to functionality, authentication, metrics, HCFS, HDFS, and YARN.
- Hive 1.0 – enhancements to performance, SQL support, and security.
- Pig 0.14 – ORCStorage class, predicate pushdown, bug fixes, and several others.
- Spark 1.4.1 – new Dataframe API and binding for SparkR, and more.
- Ability to create clusters – Rapidly create clusters in Console via quick cluster configuration. You can find this option under the EMR Quick Create menu.
- Better Application Configuration Editing – instead of using bootstrap actions, you may use a direct method to edit configurations: by passing a configuration object containing a list of configuration files to be edited as well as the settings in the files that need changing. Read the complete instructions at the Configuration Guide.
- New Packaging System – Moved release packaging system to Apache Bigtop to allow for quicker movement of applications into EMR. Ports and paths on EMR have also been moved to open source standards.
- Extra EMR configuration options for Spark – YARN has the ability to dynamically set the number of executors for Spark applications by editing the spark-defaults configuration file. For more information, check out how to Configure Spark.
Version 4.0.0 is available right now for use. You can learn more about how Amazon EMR can be useful to your organization through our AWS-accredited experts here at PolarSeven.
[video_player type=”youtube” width=”560″ height=”315″ align=”center” margin_top=”0″ margin_bottom=”20″ border_size=”3″ border_color=”#13d6e7″]aHR0cHM6Ly93d3cueW91dHViZS5jb20vd2F0Y2g/dj1IaGozZk9kdDd6bw==[/video_player]