Apache Spark and Scala Certification Training

Apache Spark and Scala Certification Training

Course Description

Curriculums

  • Learning Objectives: Understand Big Data and its components such as HDFS. You will learn about the Hadoop Cluster Architecture, Introduction to Spark and the difference between batch processing and real-time processing.

    Topics:

    What is Big Data?
    Big Data Customer Scenarios
    Limitations and Solutions of Existing Data Analytics Architecture with Uber Use Case
    How Hadoop Solves the Big Data Problem?
    What is Hadoop?
    Hadoop’s Key Characteristics
    Hadoop Ecosystem and HDFS
    Hadoop Core Components
    Rack Awareness and Block Replication
    YARN and its Advantage
    Hadoop Cluster and its Architecture
    Hadoop: Different Cluster Modes
    Big Data Analytics with Batch & Real-time Processing
    Why Spark is needed?
    What is Spark?
    How Spark differs from other frameworks?
    Spark at Yahoo!

  • Learning Objectives: Learn the basics of Scala that are required for programming Spark applications. You will also learn about the basic constructs of Scala such as variable types, control structures, collections such as Array, ArrayBuffer, Map, Lists, and many more.

    Topics:

    What is Scala?
    Why Scala for Spark?
    Scala in other Frameworks
    Introduction to Scala REPL
    Basic Scala Operations
    Variable Types in Scala
    Control Structures in Scala
    Foreach loop, Functions and Procedures
    Collections in Scala- Array
    ArrayBuffer, Map, Tuples, Lists, and more

    Hands-on:

    Scala REPL Detailed Demo

  • Learning Objectives: In this module, you will learn about object-oriented programming and functional programming techniques in Scala.

    Topics:

    Functional Programming
    Higher Order Functions
    Anonymous Functions
    Class in Scala
    Getters and Setters
    Custom Getters and Setters
    Properties with only Getters
    Auxiliary Constructor and Primary Constructor
    Singletons
    Extending a Class
    Overriding Methods
    Traits as Interfaces and Layered Traits

    Hands-on:

    OOPs Concepts
    Functional Programming

  • Learning Objectives: Understand Apache Spark and learn how to develop Spark applications. At the end, you will learn how to perform data ingestion using Sqoop.

    Topics:

    Spark’s Place in Hadoop Ecosystem
    Spark Components & its Architecture
    Spark Deployment Modes
    Introduction to Spark Shell
    Writing your first Spark Job Using SBT
    Submitting Spark Job
    Spark Web UI
    Data Ingestion using Sqoop

    Hands-on:

    Building and Running Spark Application
    Spark Application Web UI
    Configuring Spark Properties
    Data ingestion using Sqoop

  • Learning Objectives: Get an insight of Spark – RDDs and other RDD related manipulations for implementing business logics (Transformations, Actions, and Functions performed on RDD).

    Topics:

    Challenges in Existing Computing Methods
    Probable Solution & How RDD Solves the Problem
    What is RDD, It’s Operations, Transformations & Actions
    Data Loading and Saving Through RDDs
    Key-Value Pair RDDs
    Other Pair RDDs, Two Pair RDDs
    RDD Lineage
    RDD Persistence
    WordCount Program Using RDD Concepts
    RDD Partitioning & How It Helps Achieve Parallelization
    Passing Functions to Spark

    Hands-on:

    Loading data in RDDs
    Saving data through RDDs
    RDD Transformations
    RDD Actions and Functions
    RDD Partitions
    WordCount through RDDs

  • Learning Objectives: In this module, you will learn about SparkSQL which is used to process structured data with SQL queries, data-frames and datasets in Spark SQL along with different kind of SQL operations performed on the data-frames. You will also learn about Spark and Hive integration.

    Topics:

    Need for Spark SQL
    What is Spark SQL?
    Spark SQL Architecture
    SQL Context in Spark SQL
    User Defined Functions
    Data Frames & Datasets
    Interoperating with RDDs
    JSON and Parquet File Formats
    Loading Data through Different Sources
    Spark – Hive Integration

    Hands-on:

    Spark SQL – Creating Data Frames
    Loading and Transforming Data through Different Sources
    Stock Market Analysis
    Spark-Hive Integration

  • Learning Objectives: Learn why machine learning is needed, different Machine Learning techniques/algorithms, and SparK MLlib.

    Topics:

    Why Machine Learning?
    What is Machine Learning?
    Where Machine Learning is Used?
    Face Detection: USE CASE
    Different Types of Machine Learning Techniques
    Introduction to MLlib
    Features of MLlib and MLlib Tools
    Various ML algorithms supported by MLlib

  • Learning Objectives: Implement various algorithms supported by MLlib such as Linear Regression, Decision Tree, Random Forest and many more.

    Topics:

    Supervised Learning – Linear Regression, Logistic Regression, Decision Tree, Random Forest
    Unsupervised Learning – K-Means Clustering & How It Works with MLlib
    Analysis on US Election Data using MLlib (K-Means)

    Hands-on:

    Machine Learning MLlib
    K- Means Clustering
    Linear Regression
    Logistic Regression
    Decision Tree
    Random Forest

  • Learning Objectives: Understand Kafka and its Architecture. Also, learn about Kafka Cluster, how to configure different types of Kafka Cluster. Get introduced to Apache Flume, its architecture and how it is integrated with Apache Kafka for event processing. In the end, learn how to ingest streaming data using flume.

    Topics:

    Need for Kafka
    What is Kafka?
    Core Concepts of Kafka
    Kafka Architecture
    Where is Kafka Used?
    Understanding the Components of Kafka Cluster
    Configuring Kafka Cluster
    Kafka Producer and Consumer Java API
    Need of Apache Flume
    What is Apache Flume?
    Basic Flume Architecture
    Flume Sources
    Flume Sinks
    Flume Channels
    Flume Configuration
    Integrating Apache Flume and Apache Kafka

    Hands-on:

    Configuring Single Node Single Broker Cluster
    Configuring Single Node Multi Broker Cluster
    Producing and consuming messages
    Flume Commands
    Setting up Flume Agent
    Streaming Twitter Data into HDFS

  • Learning Objectives: Work on Spark streaming which is used to build scalable fault-tolerant streaming applications. Also, learn about DStreams and various Transformations performed on the streaming data. You will get to know about commonly used streaming operators such as Sliding Window Operators and Stateful Operators.

    Topics:

    Drawbacks in Existing Computing Methods
    Why Streaming is Necessary?
    What is Spark Streaming?
    Spark Streaming Features
    Spark Streaming Workflow
    How Uber Uses Streaming Data
    Streaming Context & DStreams
    Transformations on DStreams
    Describe Windowed Operators and Why it is Useful
    Important Windowed Operators
    Slice, Window and ReduceByWindow Operators
    Stateful Operators

  • Learning Objectives: In this module, you will learn about the different streaming data sources such as Kafka and flume. At the end of the module, you will be able to create a spark streaming application.

    Topics:

    Apache Spark Streaming: Data Sources
    Streaming Data Source Overview
    Apache Flume and Apache Kafka Data Sources
    Example: Using a Kafka Direct Data Source
    Perform Twitter Sentimental Analysis Using Spark Streaming

    Hands-on:

    Different Streaming Data Sources

  • Learning Objectives: Work on an end-to-end Financial domain project covering all the major concepts of Spark taught during the course.

  • Learning Objectives: In this module, you will be learning the key concepts of Spark GraphX programming and operations along with different GraphX algorithms and their implementations.

Let us know some details..