DataFrames and Spark SQL
Learning Objectives: In this module, you will learn about SparkSQL which is used to process structured data with SQL queries, data-frames and datasets in Spark SQL along with different kind of SQL operations performed on the data-frames. You will also learn about Spark and Hive integration.
Topics:
- Need for Spark SQL
- What is Spark SQL?
- Spark SQL Architecture
- SQL Context in Spark SQL
- User Defined Functions
- Data Frames & Datasets
- Interoperating with RDDs
- JSON and Parquet File Formats
- Loading Data through Different Sources
- Spark – Hive Integration
Hands-on:
- Spark SQL – Creating Data Frames
- Loading and Transforming Data through Different Sources
- Stock Market Analysis
- Spark-Hive Integration