Hadoop, (Part 2 of 4): ETL and MapReduce

Interactive

Hadoop, (Part 2 of 4): ETL and MapReduce

BizLibrary

Updated Feb 04, 2020

Book a demo Try it for free

In this course, Hadoop expert Kevin McCarty takes a closer look at some of the major components underpinning Hadoop – services such as Mahout, Oozie, and ZooKeeper, and languages such as Pig and Hive. He will examine the Hadoop architecture and look at some ETL tools Hadoop provides for moving data between a Hadoop cluster and external servers. Finally, McCarty will demonstrate a simple application in Java and follow that up with a deep dive into MapReduce including a look at automation using the Linux Chron Utility

Lesson 1:

Where Do You Find Big Data?
Big Data Sources - Volume
Big Data Sources - Variety
Structured Data
Semi-Structured
Unstructured Data
Problems with Big Data
Data Integrity
Data Completeness
Data Format
Data Timeliness
How Do We Process Big Data?
What Is ETL? - Extraction
What Is ETL? - Transform
What Is ETL? - Load.

Lesson 2:

In This Exercise...
Demo: Sqoop
Demo: Working with Tables
Demo: ETL.

Lesson 3:

What Is MapReduce?
History of MapReduce
MapReduce - Benefits
MapReduce - Limitations
Demo: MapReduce
Demo: Create a Jar File.

Lesson 4:

Demo: MapReduce Setup
Demo: Word Count Program.

Lesson 5:

Language Support
How Streaming Works
Creating a MapReduce Application
MapReduce - Execution
MapReduce - Main
MapReduce - The Mapper
MapReduce - The Reducer
Demo: Create Java File
Demo: MapReduce
Demo: Map Method
Demo: Reduce Function.

Lesson 6:

Ad-Hoc vs. Scheduling
Cron Jobs
Cron Tables
Creating a Cron Job
Example Cron Job Text
Demo: Cron Scheduling.

Related learning

Hadoop, (Part 1 of 4): Introduction and HDFSInteractive ⋅ 102 mins

Hadoop, (Part 3 of 4): YARN and NiFiInteractive ⋅ 132 mins

Hands-On PySpark for Big Data AnalysisCourse ⋅ 120 mins

Working with Big Data in PythonCourse ⋅ 180 mins

Explore more technology skills

IT Software

Web Design and Development

Web Design and Development

Data & Analytics

Data & Analytics

Design and Animation

Design and Animation

Gaming and Games Development

Gaming and Games Development

Devops, Networking and Security

Devops, Networking and Security

Programming and Web Development

Programming and Web Development

Computer Science and Engineering

Computer Science and Engineering

;