Hands-On PySpark for Big Data Analysis

Hands-On PySpark for Big Data Analysis

Updated Jan 20, 2020

Data is an incredible asset, especially when there are lots of it. Exploratory data analysis, business intelligence, and machine learning all depend on processing and analyzing Big Data at scale.  

How do you go from working on prototypes on your local machine, to handling messy data in production and at scale? 

This is a practical, hands-on course that shows you how to use Spark and its Python API to create performant analytics with large-scale data. Don't reinvent the wheel and wow your clients by building robust and responsible applications on Big Data.  

Target Audience 

This course is for developers, Data Scientists, Business Analysts or anyone who needs to reliably analyze large amounts of messy real-world data. Whether you’re tasked with creating your company's business intelligence function or creating great data platforms for your machine learning models or looking to use code to magnify the impact of your business, this course is for you.   The only prerequisite is that you are familiar with basic Python and a desire to seek insight from Big Data.   

Business Outcomes 

  • Work with large amounts of data with agility using distributed datasets and in-memory caching 

  • Source data from all popular data hosting platforms, including HDFS, Hive, JSON, and S3 

  • Deploy Big Data analytics to production using PySpark’s easy to use API