Amazon EMR (Elastic MapReduce) is a great way to efficiently analyze petabytes of data. Now AWS has made it even better by adding support for Apache Tez and Apache Phoenix, and some updates to applications already released.
Apache Tez
Tez has a set of APIs that you can use to define a Directed Acyclic Graph, one that you can use for data processing tasks. Faster than Hadoop MapReduce, you can use it in conjunction with both Hive and Pig. You can read more about it at the EMR documentation.
Apache Phoenix
You may connect to Phoenix by either using a JDBC driver on the cluster, or through other applications within or outside the cluster. It provides rapid SQL, as well as complete ACID functionalities. For quick processing, it compiles all queries into HBase scans, run them in parallel, and compiles the data into a results set.
Also updated are the following:
- HBase 1.2.1 – provides fast, low-latency random access to large datasets.
- Mahout 0.12.0 – provides scalable machine learning and data mining. Comes with math and statistics features.
- Presto 0.147 – distributed SQL engine for large datasets. Comes with bug fixes on the newer version.
You may start using these features by launching a new EMR 4.7.0 cluster. You can read more from the AWS EMR page or the Amazon EMR Developer Guide. Finally, if you want to learn how to use Amazon EMR for your enterprise, we can help. contact our cloud specialists here at PolarSeven.