Hadoop, Part 2 of 4: ETL and MapReduce
Interactive

Hadoop, Part 2 of 4: ETL and MapReduce

LearnNow Online
Updated Aug 21, 2018

Course description

In this course, Hadoop expert Kevin McCarty takes a closer look at some of the major components underpinning Hadoop – services such as Mahout, Oozie, and ZooKeeper, and languages such as Pig and Hive. He will examine the Hadoop architecture and look at some ETL tools Hadoop provides for moving data between a Hadoop cluster and external servers. Finally, McCarty will demonstrate a simple application in Java and follow that up with a deep dive into MapReduce including a look at automation using the Linux Chron Utility

Each LearnNowOnline training course is made up of Modules (typically an hour in length). Within each module there are Topics (typically 15-30 minutes each) and Subtopics (typically 2-5 minutes each). There is a Post Exam for each Module that must be passed with a score of 70% or higher to successfully and fully complete the course.


Prerequisites

This course assumes that students have some programming background and some familiarity with a Unix-based operating system. No specific experience with Java programming language or Hadoop is required. As with any such course, the more experience you bring to the course, the more you’ll get out of it. This course moves quickly through a broad range of topics, but it does not require any prior experience with Hadoop. The course does assume that you are well familiarized with how to use the version of Windows that you are running. For example, the course might say simply “Open PuTTY” without explaining how to do that. You should also be able to navigate the folder hierarchy using Windows Explorer.


Meet the expert

Kevin McCarty

Kevin McCarty is a computer professional with over 30 years of experience in the industry as a programmer, project manager, database administrator, architect, and data scientist. He is a Microsoft Certified Trainer with over 25 individual certifications in programming and database technologies and serves as the chapter leader of the Boise SQL Server Users Group. A former Army officer and Eagle Scout, he holds a doctorate in Computer Science and a lifelong love of learning.

Video Runtime

100 Minutes

Time to complete

170 Minutes

Course Outline

ETL and MapReduce

Big Data Sources And ETL (19:11)

  • Introduction (00:28)
  • Where Do You Find Big Data? (00:46)
  • Big Data Sources - Volume (01:02)
  • Big Data Sources - Variety (03:02)
  • Structured Data (00:43)
  • Semi-Structured (00:26)
  • Unstructured Data (00:24)
  • Problems with Big Data (00:32)
  • Data Integrity (02:21)
  • Data Completeness (00:47)
  • Data Format (01:22)
  • Data Timeliness (00:57)
  • How Do We Process Big Data? (01:08)
  • What Is ETL? - Extraction (00:43)
  • What Is ETL? - Transform (02:48)
  • What Is ETL? - Load (01:08)
  • Summary (00:26)

ETL Demonstration (15:15)

  • Introduction (00:30)
  • In This Exercise... (00:09)
  • Demo: Sqoop (04:43)
  • Demo: Working with Tables (04:56)
  • Demo: ETL (04:32)
  • Summary (00:23)

Understanding MapReduce (16:55)

  • Introduction (00:24)
  • What Is MapReduce? (00:51)
  • History of MapReduce (04:42)
  • MapReduce - Benefits (01:43)
  • MapReduce - Limitations (02:25)
  • Demo: MapReduce (04:33)
  • Demo: Create a Jar File (01:48)
  • Summary (00:25)

MapReduce Demonstration (09:54)

  • Introduction (00:28)
  • Demo: MapReduce Setup (04:08)
  • Demo: Word Count Program (04:47)
  • Summary (00:29)

Developing MapReduce (28:56)

  • Introduction (00:25)
  • Language Support (00:56)
  • How Streaming Works (01:02)
  • Creating a MapReduce Application (00:35)
  • MapReduce - Execution (01:52)
  • MapReduce - Main (01:05)
  • MapReduce - The Mapper (00:42)
  • MapReduce - The Reducer (01:26)
  • Demo: Create Java File (06:02)
  • Demo: MapReduce (05:06)
  • Demo: Map Method (03:11)
  • Demo: Reduce Function (05:57)
  • Summary (00:30)

Schedule MapReduce (10:05)

  • Introduction (00:29)
  • Ad-Hoc vs. Scheduling (01:44)
  • Cron Jobs (01:00)
  • Cron Tables (00:31)
  • Creating a Cron Job (01:08)
  • Example Cron Job Text (00:37)
  • Demo: Cron Scheduling (04:09)
  • Summary (00:23)

Bundle Code: LNO1234

;