About 50 results
Open links in new tab
  1. Apache Spark™ - Unified Engine for large-scale data analytics

    Apache Spark is a multi-language engine for executing data engineering, data science, and machine learning on single-node machines or clusters.

  2. Overview - Spark 4.1.0 Documentation

    If you’d like to build Spark from source, visit Building Spark. Spark runs on both Windows and UNIX-like systems (e.g. Linux, Mac OS), and it should run on any platform that runs a supported version of Java.

  3. Downloads - Apache Spark

    Spark docker images are available from Dockerhub under the accounts of both The Apache Software Foundation and Official Images. Note that, these images contain non-ASF software and may be …

  4. Documentation - Apache Spark

    The documentation linked to above covers getting started with Spark, as well the built-in components MLlib, Spark Streaming, and GraphX. In addition, this page lists other resources for learning Spark.

  5. Quick Start - Spark 4.1.0 Documentation

    To follow along with this guide, first, download a packaged release of Spark from the Spark website. Since we won’t be using HDFS, you can download a package for any version of Hadoop.

  6. Examples - Apache Spark

    Spark allows you to perform DataFrame operations with programmatic APIs, write SQL, perform streaming analyses, and do machine learning. Spark saves you from learning multiple frameworks …

  7. Spark SQL & DataFrames | Apache Spark

    Spark SQL includes a cost-based optimizer, columnar storage and code generation to make queries fast. At the same time, it scales to thousands of nodes and multi hour queries using the Spark …

  8. Spark Declarative Pipelines Programming Guide

    Spark Declarative Pipelines (SDP) is a declarative framework for building reliable, maintainable, and testable data pipelines on Spark. SDP simplifies ETL development by allowing you to focus on the …

  9. Spark Streaming - Spark 4.1.0 Documentation

    Spark Streaming is an extension of the core Spark API that enables scalable, high-throughput, fault-tolerant stream processing of live data streams. Data can be ingested from many sources like Kafka, …

  10. Quickstart: DataFrame — PySpark 4.1.0 documentation - Apache Spark

    DataFrame and Spark SQL share the same execution engine so they can be interchangeably used seamlessly. For example, you can register the DataFrame as a table and run a SQL easily as below: