439 already collected.
This is a complete PySpark Developer course for Data Engineers and Data Scientists and others who wants to process Big Data in an effective manner. We will cover below topics and more:
Complete Curriculum for a successful PySpark Developer
Complete Flow of Installation of PySpark
Introduction to Spark (Why Spark was Developed, Spark Features, Spark Components)
Spark RDD Fundamentals
How to Create RDDs
RDD Operations (Transformations & Actions)
Spark Cluster Architecture - Execution, YARN, JVM Processes, DAG Scheduler, Task Scheduler
Spark Shared Variables (Broadcast and Accumulators)
Spark SQL Architecture, Catalyst Optimizer, Volcano Iterator Model, Tungsten Execution Engine, Different Benchmarks
Spark Commonly Used Functions - Version, range, createDataFrame, sql, table, SparkContext, conf, read, udf, newSession, stop, catalog etc
DataFrame Built-in functions - new column, encryption, string, regexp, date, null, collection, na, math and statistics, explode, flatten, formatting and json
What is Partition, Repartition and Coalesce
Repartition Vs Coalesce
Extraction - csv file, text file, Parquet File, orc file, json file, avro file, hive, jdbc
DataFrame Fundamentals (What is a DataFrame, DataFrame Sources, DataFrame Features, DataFrame Organization)
DataFrame Rows, Columns and DataTypes. Practical examples.
ETL Using DataFrame (Extraction APIs, Transformation APIs, and Loading APIs). Practical Examples.
Optimization and Management - Join Strategies, Driver Conf, Parallelism Configurations, Executor Conf etc
HDFS Commands (Will be added shortly)
Python Fundamentals (Will be added shortly)
More will be added
Udemy online courses start at $11.99
Top courses from $13.99 when you first visit Udemy.