Cloud Data flow for Data Processing

Goal: In this module, you will learn how to develop and execute a variety of data processing patterns using Dataflow processing and how to manage cluster using Dataproc service.

Objective: Upon completing this module, you should be able to understand:

  • Build dataflow pipeline
  • How to create a maven project with Dataflow SDK
  • How to create and execute streaming pipeline using Dataflow template
  • How to create pipeline on Beam
  • Testing pipeline
  • Create/Manage/Delete cluster using Dataproc service
  • How to run a job on cluster
  • Using APIs to automate jobs
Topics:

  • Dataflow services
  • Stream and Batch processing
  • Apache Beam SDK
  • Monitoring using Stackdriver
  • Data transformation with Cloud Data flow
  • Working with Dataproc
  • Creating Cluster
  • Managing cluster
  • Automation of jobs