Flight Delays Demo: Exploratory Data Analysis
Show how data can be summarized, visualized, and analyzed using Spark dataframes and Python visualization tools (Seaborn).
Details from website:
Before you can have a machine learning model, first you have to have data. Most real-world data requires processing before it can be used, however, and requires a degree of transformation. ETL is a type of data integration that uses three steps -- extract, transform, and load -- to blend data from multiple sources.
During the process, data is taken from a source system (extracted), converted into a format that can be analyzed (transformed), and stored (loaded) into a destination system. In this step of the project, we will convert our dataset to a format which can efficiently analyzed.