
Apache Spark™ - Unified Engine for large-scale data analytics
Apache Spark is a multi-language engine for executing data engineering, data science, and machine learning on single-node machines or clusters.
Overview - Spark 4.1.0 Documentation
If you’d like to build Spark from source, visit Building Spark. Spark runs on both Windows and UNIX-like systems (e.g. Linux, Mac OS), and it should run on any platform that runs a supported version of Java.
Documentation - Apache Spark
The documentation linked to above covers getting started with Spark, as well the built-in components MLlib, Spark Streaming, and GraphX. In addition, this page lists other resources for learning Spark.
Quick Start - Spark 4.1.0 Documentation
Spark’s shell provides a simple way to learn the API, as well as a powerful tool to analyze data interactively. It is available in either Scala (which runs on the Java VM and is thus a good way to use …
Examples - Apache Spark
Spark allows you to perform DataFrame operations with programmatic APIs, write SQL, perform streaming analyses, and do machine learning. Spark saves you from learning multiple frameworks …
PySpark Overview — PySpark 4.1.0 documentation - Apache Spark
Dec 11, 2025 · Spark Connect is a client-server architecture within Apache Spark that enables remote connectivity to Spark clusters from any application. PySpark provides the client for the Spark …
Spark SQL & DataFrames | Apache Spark
Spark SQL includes a cost-based optimizer, columnar storage and code generation to make queries fast. At the same time, it scales to thousands of nodes and multi hour queries using the Spark …
Spark Declarative Pipelines Programming Guide
Spark Declarative Pipelines (SDP) is a declarative framework for building reliable, maintainable, and testable data pipelines on Spark. SDP simplifies ETL development by allowing you to focus on the …
Quickstart: DataFrame — PySpark 4.1.0 documentation - Apache Spark
DataFrame and Spark SQL share the same execution engine so they can be interchangeably used seamlessly. For example, you can register the DataFrame as a table and run a SQL easily as below:
Spark Connect | Apache Spark
Check out the guide on migrating from Spark JVM to Spark Connect to learn more about how to write code that works with Spark Connect. Also, check out how to build Spark Connect custom extensions …