Home
Big Data Training
Apache Spark Training
Apache Spark in the Cloud Training Course

Apache Spark in the Cloud Training Course

Initially, the learning curve for Apache Spark can be steep, requiring significant effort before achieving tangible results. This course is designed to help learners quickly overcome that initial hurdle. Upon completion, participants will gain a solid understanding of Apache Spark fundamentals, clearly distinguish between RDDs and DataFrames, and master both the Python and Scala APIs. They will also develop a comprehensive grasp of executors, tasks, and other core concepts. In alignment with industry best practices, the course places a strong emphasis on cloud-based deployment, specifically focusing on Databricks and AWS environments. Students will also learn to differentiate between AWS EMR and AWS Glue, one of AWS's most recent Spark services.

AUDIENCE:

Data Engineers, DevOps Professionals, Data Scientists

This course is available as onsite live training in Romania or online live training.

Thank you for sending your enquiry! One of our team members will contact you shortly.

Thank you for sending your booking! One of our team members will contact you shortly.

Course Outline

Introduction:

Apache Spark within the Hadoop Ecosystem
Brief overview of Python and Scala

Core Concepts (Theory):

Architecture
Resilient Distributed Datasets (RDD)
Transformations and Actions
Stages, Tasks, and Dependencies

Practical Workshop: Basics in the Databricks Environment

RDD API exercises
Core transformation and action functions
PairRDDs
Join operations
Effective caching strategies
DataFrame API exercises
Spark SQL
DataFrame operations: select, filter, group, and sort
User-Defined Functions (UDFs)
Exploration of the Dataset API
Streaming capabilities

Practical Workshop: Deployment in the AWS Environment

Foundations of AWS Glue
Comparing AWS EMR and AWS Glue
Sample jobs in both environments
Evaluating advantages and disadvantages

Supplementary Content:

Introduction to Apache Airflow orchestration

Requirements

Programming proficiency (preferably in Python or Scala)

Foundational knowledge of SQL

21 Hours

Number of participants

Online

Classroom

Select Location

Please select a Venue

Price per participant

Open Training Courses require 5+ participants.

Apache Spark in the Cloud Training Course - Booking

Full Name *

Email *

Phone *

Job Title

Company Name

Address 1 *

City *

State / Province

Country *

Postcode *

Start Date

Tax ID

Dates are subject to availability and take place between 09:30 and 16:30.

Payment *

Bank Transfer (Invoice, PO)

Debit / Credit Card

Comments

Terms and Conditions *

I am an authorised representative of the above named client and I wish to book the above courses or services in accordance with NobleProg Terms and Conditions and Privacy Policy.

Inform me about discounts and promotions

Please read our Privacy Policy to find out how we use your data

Apache Spark in the Cloud Training Course - Enquiry

Full Name *

Email *

Phone *

Number of participants

Company Name

Company Address

How do you want to take the course?

Client Premises

Online

Classroom

Comments

Inform me about discounts and promotions

Please read our Privacy Policy to find out how we use your data

Apache Spark in the Cloud - Consultancy Enquiry

Full Name *

Phone *

Email *

Company Name

Consultancy Subject *

Consultancy Goal

Who will the consultant work with?

Consultancy Urgency *

Comments

Inform me about discounts and promotions

Please read our Privacy Policy to find out how we use your data

Testimonials (3)

Having hands on session / assignments

Poornima Chenthamarakshan - Intelligent Medical Objects

Course - Apache Spark in the Cloud

1. Right balance between high level concepts and technical details. 2. Andras is very knowledgeable about his teaching. 3. Exercise

Steven Wu - Intelligent Medical Objects

Course - Apache Spark in the Cloud

Get to learn spark streaming , databricks and aws redshift

Lim Meng Tee - Jobstreet.com Shared Services Sdn. Bhd.

Course - Apache Spark in the Cloud

Upcoming Courses

Apache Spark in the Cloud

2026-06-22 09:30

21 hours

ActiveOne Bucharest

979 EUR (Online)

1579 EUR (Classroom)

Apache Spark in the Cloud

2026-07-06 09:30

21 hours

Iasi, City Center

979 EUR (Online)

1579 EUR (Classroom)

Apache Spark in the Cloud

2026-07-20 09:30

21 hours

Constanta, Mircea cel Batran Street

979 EUR (Online)

1579 EUR (Classroom)

Related Courses

Big Data Analytics with Google Colab and Apache Spark

14 Hours

This instructor-led, live training in Romania (online or onsite) is designed for intermediate-level data scientists and engineers who intend to utilize Google Colab and Apache Spark for big data processing and analytics.

By the end of this training, participants will be able to:

Configure a big data environment using Google Colab and Spark.
Process and analyze large datasets efficiently with Apache Spark.
Visualize big data in a collaborative environment.
Integrate Apache Spark with cloud-based tools.

Big Data Analytics in Health

21 Hours

Big data analytics is the process of examining vast, diverse datasets to uncover correlations, hidden patterns, and actionable insights.

The healthcare industry generates enormous volumes of complex, heterogeneous medical and clinical data. Applying big data analytics to this information holds significant potential for deriving insights that improve healthcare delivery. However, the sheer scale of these datasets presents substantial challenges for analysis and practical implementation in clinical settings.

In this instructor-led, live remote training, participants will learn how to conduct big data analytics in healthcare by working through a series of hands-on, live laboratory exercises.

By the conclusion of this training, participants will be able to:

Install and configure big data analytics tools such as Hadoop MapReduce and Spark
Understand the characteristics of medical data
Apply big data techniques to manage and analyze medical data
Study big data systems and algorithms within the context of health applications

Audience

Developers
Data Scientists

Format of the Course

A blend of lectures, discussions, exercises, and extensive hands-on practice.

Note

To request customized training for this course, please contact us to arrange.

Hadoop and Spark for Administrators

35 Hours

This instructor-led live training in Romania (online or onsite) is tailored for system administrators seeking to learn how to set up, deploy, and manage Hadoop clusters within their organizations.

By the end of this training, participants will be able to:

Install and configure Apache Hadoop.
Understand the four major components of the Hadoop ecosystem: HDFS, MapReduce, YARN, and Hadoop Common.
Use the Hadoop Distributed File System (HDFS) to scale a cluster to hundreds or thousands of nodes.
Configure HDFS to function as a storage engine for on-premise Spark deployments.
Set up Spark to access alternative storage solutions like Amazon S3 and NoSQL database systems such as Redis, Elasticsearch, Couchbase, Aerospike, and others.
Perform administrative tasks such as provisioning, management, monitoring, and securing an Apache Hadoop cluster.

A Practical Introduction to Stream Processing

21 Hours

Through this instructor-led live training in Romania (onsite or remote), attendees will learn how to configure and integrate various Stream Processing frameworks with existing big data storage solutions, as well as related software applications and microservices.

Upon completion of this training, participants will be capable of:

Installing and configuring various Stream Processing frameworks, such as Spark Streaming and Kafka Streaming.
Understanding the characteristics of different frameworks and selecting the most suitable one for specific tasks.
Processing data continuously, concurrently, and on a record-by-record basis.
Integrating Stream Processing solutions with existing databases, data warehouses, data lakes, and other systems.
Integrating the most appropriate stream processing library into enterprise applications and microservices.

PySpark and Machine Learning

21 Hours

This course offers a hands-on introduction to developing scalable data processing and Machine Learning workflows using PySpark. Participants will discover how Apache Spark functions within contemporary Big Data ecosystems and learn to process large datasets efficiently by leveraging distributed computing principles.

SMACK Stack for Data Science

14 Hours

This instructor-led live training in Romania (available online or on-site) is designed for data scientists who wish to utilize the SMACK stack to build data processing platforms for big data solutions.

By the end of this training, participants will be able to:

Implement a data pipeline architecture for processing big data.
Develop a cluster infrastructure with Apache Mesos and Docker.
Analyze data with Spark and Scala.
Manage unstructured data with Apache Cassandra.

Apache Spark Fundamentals

21 Hours

This instructor-led, live training in Romania (online or onsite) is aimed at engineers who wish to set up and deploy Apache Spark system for processing very large amounts of data.

By the end of this training, participants will be able to:

Install and configure Apache Spark.
Quickly process and analyze very large data sets.
Understand the difference between Apache Spark and Hadoop MapReduce and when to use which.
Integrate Apache Spark with other machine learning tools.

Administration of Apache Spark

35 Hours

This instructor-led, live training in Romania (online or onsite) is designed for beginner to intermediate system administrators who want to deploy, maintain, and optimize Spark clusters.

Upon completion of this training, participants will be able to:

Install and configure Apache Spark across various environments.
Manage cluster resources and monitor Spark applications.
Optimize the performance of Spark clusters.
Implement security measures and ensure high availability.
Debug and troubleshoot common Spark issues.

Spark for Developers

21 Hours

OBJECTIVE:

This course provides an introduction to Apache Spark. Participants will learn how Spark integrates into the Big Data ecosystem and discover methods for leveraging it in data analysis. The curriculum covers the Spark shell for interactive analysis, Spark internals, APIs, Spark SQL, Spark Streaming, as well as Machine Learning and GraphX functionalities.

AUDIENCE :

Developers and Data Analysts

Scaling Data Pipelines with Spark NLP

14 Hours

This instructor-led, live training in Romania (online or onsite) is aimed at data scientists and developers who wish to use Spark NLP, built on top of Apache Spark, to develop, implement, and scale natural language text processing models and pipelines.

By the end of this training, participants will be able to:

Configure the necessary development environment to begin building NLP pipelines with Spark NLP.
Gain a clear understanding of the features, architecture, and advantages of using Spark NLP.
Utilize the pre-trained models provided in Spark NLP to implement text processing tasks.
Learn how to construct, train, and scale Spark NLP models for production-grade projects.
Apply classification, inference, and sentiment analysis techniques to real-world use cases (e.g., clinical data, customer behavior insights).

Python and Spark for Big Data (PySpark)

21 Hours

In this instructor-led, live training in Romania, participants will learn how to use Python and Spark together to analyze big data as they work on hands-on exercises.

By the end of this training, participants will be able to:

Learn how to use Spark with Python to analyze Big Data.
Work on exercises that mimic real world cases.
Use different tools and techniques for big data analysis using PySpark.

Python, Spark, and Hadoop for Big Data

21 Hours

This instructor-led, live training in Romania (online or onsite) is aimed at developers who wish to use and integrate Spark, Hadoop, and Python to process, analyze, and transform large and complex data sets.

By the end of this training, participants will be able to:

Set up the necessary environment to start processing big data with Spark, Hadoop, and Python.
Understand the features, core components, and architecture of Spark and Hadoop.
Learn how to integrate Spark, Hadoop, and Python for big data processing.
Explore the tools in the Spark ecosystem (Spark MLlib, Spark Streaming, Kafka, Sqoop, Kafka, and Flume).
Build collaborative filtering recommendation systems similar to Netflix, YouTube, Amazon, Spotify, and Google.
Use Apache Mahout to scale machine learning algorithms.

Apache Spark SQL

7 Hours

Spark SQL is the Apache Spark module designed for handling both structured and unstructured data. It offers insights into data structure and the computations being executed, enabling performance optimizations. Spark SQL is commonly used for:
- executing SQL queries.
- accessing data from an existing Hive deployment.

During this instructor-led live training (available onsite or remotely), participants will gain skills in analyzing diverse datasets using Spark SQL.

Upon completing this training, participants will be able to:

Install and set up Spark SQL.
Conduct data analysis using Spark SQL.
Query datasets in various formats.
Visualize data and the results of queries.

Course Format

Interactive lectures and discussions.
Extensive exercises and practical sessions.
Practical implementation in a live laboratory environment.

Customization Options for the Course

To request a customized version of this course, please contact us to arrange.

Stratio: Rocket and Intelligence Modules with PySpark

14 Hours

Stratio is a data-centric platform that combines big data, artificial intelligence, and governance into a unified solution. Its Rocket and Intelligence modules facilitate rapid data exploration, transformation, and advanced analytics within enterprise settings.

This instructor-led training session, available both online and on-site, is designed for data professionals at an intermediate level who aim to effectively utilize Stratio's Rocket and Intelligence modules with PySpark. The curriculum focuses on looping structures, user-defined functions, and complex data logic.

Upon completing this training, participants will be capable of:

Navigating and operating within the Stratio platform using the Rocket and Intelligence modules.
Applying PySpark for data ingestion, transformation, and analysis.
Utilizing loops and conditional logic to manage data workflows and feature engineering tasks.
Creating and managing user-defined functions (UDFs) to enable reusable data operations in PySpark.

Format of the Course

Interactive lectures and discussions.
Extensive exercises and practical practice.
Hands-on implementation within a live-lab environment.

Course Customization Options

To request customized training for this course, please contact us to arrange.

Apache Spark in the Cloud Training Course

Course Outline

Requirements

Testimonials (3)

Poornima Chenthamarakshan - Intelligent Medical Objects

Course - Apache Spark in the Cloud

Steven Wu - Intelligent Medical Objects

Course - Apache Spark in the Cloud

Lim Meng Tee - Jobstreet.com Shared Services Sdn. Bhd.

Course - Apache Spark in the Cloud

Upcoming Courses

Apache Spark in the Cloud

Apache Spark in the Cloud

Apache Spark in the Cloud

Related Categories

This site in other countries/regions

Europe

Asia Pacific

North America

South America

Africa / Middle East

Other sites