top of page

IT solutions to achieve operational excellence

Training Curriculum
​
Day 1
-
Introduction to Apache Hadoop
-
Hadoop Distributed File System (HDFS)
-
MapReduce Concepts and Execution
​
Day 2
-
Introduction to Apache Spark
-
PySpark Basics: RDDs (Resilient Distributed Datasets)
-
DataFrames and SQL in PySpark
​
Day 3
-
Advanced PySpark Concepts: Transformations and Actions
-
PySpark Streaming
-
Data Analytics with PySpark
​
Day 4
-
Integration of PySpark and Prefect
-
Building Data Pipelines with PySpark and Prefect
-
Real-World Use Cases and Best Practices
​
Day 5
-
Comprehensive Project: Applying PySpark and Hadoop
-
Q&A and Course Wrap-Up
bottom of page