Log in
Log inBook a demo

Intro to Hadoop and MapReduce

COURSE
C
Cloudera

Intro to Hadoop and MapReduce

COURSE
C
Cloudera

The Apache™ Hadoop® project develops open-source software for reliable, scalable, distributed computing. Learn the fundamental principles behind it, and how you can use its power to make sense of your Big Data.

Why Take This Course?

  • How Hadoop fits into the world (recognize the problems it solves)
  • Understand the concepts of HDFS and MapReduce (find out how it solves the problems)
  • Write MapReduce programs (see how we solve the problems)
  • Practice solving problems on your own

Prerequisites and Requirements

Lesson 1 does not have technical prerequisites and is a good overview of Hadoop and MapReduce for managers.

To get the most out of the class, however, you need basic programming skills in Python on a level provided by introductory courses like our Introduction to Computer Science course.

To learn more about Hadoop, you can also check out the book Hadoop: The Definitive Guide.

What Will I Learn?

Final Project

In this project you will work with discussion forum (also sometimes called discussion board) data. It is one type of user generated content that you can find all around the web. Most popular websites have some kind of a forum, and the things you will do in this project can transfer to other similar projects.

Syllabus

Lesson 1

What is "Big Data"? The dimensions of Big Data. Scaling problems. HDFS and the Hadoop ecosystem.

Lesson 2

The basics of HDFS, MapReduce and Hadoop cluster.

Lesson 3

Writing MapReduce programs to answer questions about data.

Lesson 4

MapReduce design patterns.

Final Project

Answering questions about big sales data and analyzing large website logs.

Learning
Module 1