The Apache™ Hadoop® project develops open-source software for reliable, scalable, distributed computing. Learn the fundamental principles behind it, and how you can use its power to make sense of your Big Data.
Lesson 1 does not have technical prerequisites and is a good overview of Hadoop and MapReduce for managers.
To get the most out of the class, however, you need basic programming skills in Python on a level provided by introductory courses like our Introduction to Computer Science course.
To learn more about Hadoop, you can also check out the book Hadoop: The Definitive Guide.
In this project you will work with discussion forum (also sometimes called discussion board) data. It is one type of user generated content that you can find all around the web. Most popular websites have some kind of a forum, and the things you will do in this project can transfer to other similar projects.
What is "Big Data"? The dimensions of Big Data. Scaling problems. HDFS and the Hadoop ecosystem.
The basics of HDFS, MapReduce and Hadoop cluster.
Writing MapReduce programs to answer questions about data.
MapReduce design patterns.
Answering questions about big sales data and analyzing large website logs.