Configure Apache Spark 3.5 with Delta Lake and MinIO object storage for ACID transactions, data versioning, and distributed analytics processing. Includes complete setup for production-grade data lake architecture.
Prerequisites
- 4GB RAM minimum
- Java 17 compatible system
- Network access for package downloads
- sudo privileges
What this solves
Apache Spark 3.5 with Delta Lake provides ACID transactions and versioned data management for big data workloads, while MinIO offers S3-compatible object storage for distributed data lakes. This setup enables reliable data processing with transaction guarantees, time travel capabilities, and schema evolution support essential for enterprise analytics pipelines.
Step-by-step installation
Update system packages and install prerequisites
Start by updating your package manager and installing Java 17 and essential build tools for Spark.
sudo apt update && sudo apt upgrade -y
sudo apt install -y openjdk-17-jdk wget curl unzip python3 python3-pip
Configure Java environment variables
Set up JAVA_HOME environment variable for Spark to locate the Java installation properly.
echo 'export JAVA_HOME=/usr/lib/jvm/java-17-openjdk-amd64' | sudo tee -a /etc/environment
echo 'export PATH=$JAVA_HOME/bin:$PATH' | sudo tee -a /etc/environment
source /etc/environment
java -version
Create Spark user and directories
Create a dedicated user for Spark operations and set up the required directory structure with proper permissions.
sudo useradd -m -s /bin/bash spark
sudo mkdir -p /opt/spark /opt/spark/logs /opt/spark/work
sudo chown -R spark:spark /opt/spark
Download and install Apache Spark 3.5
Download Spark 3.5 with Hadoop support and extract it to the installation directory.
cd /tmp
wget https://archive.apache.org/dist/spark/spark-3.5.0/spark-3.5.0-bin-hadoop3.tgz
tar -xzf spark-3.5.0-bin-hadoop3.tgz
sudo mv spark-3.5.0-bin-hadoop3/* /opt/spark/
sudo chown -R spark:spark /opt/spark
Configure Spark environment
Set up Spark configuration files and environment variables for optimal performance and Delta Lake integration.
#!/usr/bin/env bash
export JAVA_HOME=/usr/lib/jvm/java-17-openjdk-amd64
export SPARK_HOME=/opt/spark
export SPARK_CONF_DIR=/opt/spark/conf
export SPARK_LOG_DIR=/opt/spark/logs
export SPARK_WORKER_DIR=/opt/spark/work
export PYSPARK_PYTHON=python3
export SPARK_MASTER_HOST=localhost
export SPARK_MASTER_PORT=7077
export SPARK_MASTER_WEBUI_PORT=8080
export SPARK_WORKER_WEBUI_PORT=8081
Create Spark defaults configuration
Configure Spark with Delta Lake dependencies and MinIO S3-compatible settings for object storage integration.
# Delta Lake Configuration
spark.sql.extensions=io.delta.sql.DeltaSparkSessionExtension
spark.sql.catalog.spark_catalog=org.apache.spark.sql.delta.catalog.DeltaCatalog
S3/MinIO Configuration
spark.hadoop.fs.s3a.endpoint=http://localhost:9000
spark.hadoop.fs.s3a.access.key=minioadmin
spark.hadoop.fs.s3a.secret.key=minioadmin
spark.hadoop.fs.s3a.path.style.access=true
spark.hadoop.fs.s3a.connection.ssl.enabled=false
spark.hadoop.fs.s3a.impl=org.apache.hadoop.fs.s3a.S3AFileSystem
spark.hadoop.fs.s3a.aws.credentials.provider=org.apache.hadoop.fs.s3a.SimpleAWSCredentialsProvider
Performance Optimization
spark.serializer=org.apache.spark.serializer.KryoSerializer
spark.sql.adaptive.enabled=true
spark.sql.adaptive.coalescePartitions.enabled=true
spark.executor.memory=2g
spark.driver.memory=1g
spark.executor.cores=2
spark.default.parallelism=8
Download Delta Lake and AWS SDK JARs
Download the required JAR files for Delta Lake functionality and S3A filesystem support with MinIO.
cd /opt/spark/jars
sudo wget https://repo1.maven.org/maven2/io/delta/delta-core_2.12/2.4.0/delta-core_2.12-2.4.0.jar
sudo wget https://repo1.maven.org/maven2/io/delta/delta-storage/2.4.0/delta-storage-2.4.0.jar
sudo wget https://repo1.maven.org/maven2/org/apache/hadoop/hadoop-aws/3.3.4/hadoop-aws-3.3.4.jar
sudo wget https://repo1.maven.org/maven2/com/amazonaws/aws-java-sdk-bundle/1.12.367/aws-java-sdk-bundle-1.12.367.jar
sudo chown spark:spark *.jar
Install and configure MinIO server
Set up MinIO object storage server to provide S3-compatible storage for Delta Lake data files.
wget https://dl.min.io/server/minio/release/linux-amd64/minio
chmod +x minio
sudo mv minio /usr/local/bin/
sudo mkdir -p /opt/minio/data
sudo useradd -r minio-user
sudo chown -R minio-user:minio-user /opt/minio
Create MinIO systemd service
Configure MinIO as a systemd service with proper security settings and automatic startup.
[Unit]
Description=MinIO Object Storage Server
After=network.target
[Service]
Type=simple
User=minio-user
Group=minio-user
WorkingDirectory=/opt/minio
Environment=MINIO_ROOT_USER=minioadmin
Environment=MINIO_ROOT_PASSWORD=minioadmin123
ExecStart=/usr/local/bin/minio server /opt/minio/data --address :9000 --console-address :9001
Restart=always
RestartSec=10
StandardOutput=journal
StandardError=journal
[Install]
WantedBy=multi-user.target
Start MinIO and create data bucket
Enable and start the MinIO service, then create a bucket for Delta Lake data storage.
sudo systemctl daemon-reload
sudo systemctl enable --now minio
sudo systemctl status minio
Install MinIO client
wget https://dl.min.io/client/mc/release/linux-amd64/mc
chmod +x mc
sudo mv mc /usr/local/bin/
Configure MinIO client and create bucket
mc alias set local http://localhost:9000 minioadmin minioadmin123
mc mb local/delta-lake
mc ls local
Create Spark master and worker systemd services
Set up systemd services for Spark master and worker nodes to run as managed services.
[Unit]
Description=Apache Spark Master
After=network.target
[Service]
Type=forking
User=spark
Group=spark
WorkingDirectory=/opt/spark
Environment=JAVA_HOME=/usr/lib/jvm/java-17-openjdk-amd64
Environment=SPARK_HOME=/opt/spark
ExecStart=/opt/spark/sbin/start-master.sh
ExecStop=/opt/spark/sbin/stop-master.sh
Restart=always
RestartSec=10
[Install]
WantedBy=multi-user.target
[Unit]
Description=Apache Spark Worker
After=network.target spark-master.service
Requires=spark-master.service
[Service]
Type=forking
User=spark
Group=spark
WorkingDirectory=/opt/spark
Environment=JAVA_HOME=/usr/lib/jvm/java-17-openjdk-amd64
Environment=SPARK_HOME=/opt/spark
ExecStart=/opt/spark/sbin/start-worker.sh spark://localhost:7077
ExecStop=/opt/spark/sbin/stop-worker.sh
Restart=always
RestartSec=10
[Install]
WantedBy=multi-user.target
Start Spark services
Enable and start the Spark master and worker services to create a working Spark cluster.
sudo chmod +x /opt/spark/sbin/*.sh
sudo systemctl daemon-reload
sudo systemctl enable --now spark-master
sudo systemctl enable --now spark-worker
sudo systemctl status spark-master
sudo systemctl status spark-worker
Configure firewall for Spark and MinIO
Open the necessary ports for Spark web UI, cluster communication, and MinIO access.
sudo ufw allow 7077/tcp # Spark Master
sudo ufw allow 8080/tcp # Spark Master Web UI
sudo ufw allow 8081/tcp # Spark Worker Web UI
sudo ufw allow 9000/tcp # MinIO API
sudo ufw allow 9001/tcp # MinIO Console
sudo ufw reload
Create Delta Lake test application
Create a Python application to test Delta Lake ACID transactions and data versioning with MinIO storage.
from pyspark.sql import SparkSession
from delta import *
Create Spark session with Delta Lake configuration
spark = SparkSession.builder \
.appName("DeltaLakeTest") \
.master("spark://localhost:7077") \
.config("spark.sql.extensions", "io.delta.sql.DeltaSparkSessionExtension") \
.config("spark.sql.catalog.spark_catalog", "org.apache.spark.sql.delta.catalog.DeltaCatalog") \
.config("spark.hadoop.fs.s3a.endpoint", "http://localhost:9000") \
.config("spark.hadoop.fs.s3a.access.key", "minioadmin") \
.config("spark.hadoop.fs.s3a.secret.key", "minioadmin123") \
.config("spark.hadoop.fs.s3a.path.style.access", "true") \
.config("spark.hadoop.fs.s3a.connection.ssl.enabled", "false") \
.config("spark.hadoop.fs.s3a.impl", "org.apache.hadoop.fs.s3a.S3AFileSystem") \
.getOrCreate()
Create test data
data = [(1, "John", 25), (2, "Jane", 30), (3, "Bob", 35)]
columns = ["id", "name", "age"]
df = spark.createDataFrame(data, columns)
Write as Delta table to MinIO
print("Writing Delta table...")
df.write.format("delta").mode("overwrite").save("s3a://delta-lake/users")
Read Delta table
print("Reading Delta table...")
delta_df = spark.read.format("delta").load("s3a://delta-lake/users")
delta_df.show()
Perform an update operation (ACID transaction)
print("Updating records...")
delta_table = DeltaTable.forPath(spark, "s3a://delta-lake/users")
delta_table.update(
condition="id = 1",
set={"age": "26"}
)
Show updated data
print("After update:")
delta_table.toDF().show()
Show table history (versioning)
print("Table history:")
delta_table.history().show()
spark.stop()
Install Python dependencies and run test
Install the required Python packages and execute the Delta Lake test to verify ACID transactions work correctly.
sudo pip3 install pyspark==3.5.0 delta-spark==2.4.0
sudo chown spark:spark /opt/spark/test_delta.py
sudo -u spark python3 /opt/spark/test_delta.py
Verify your setup
Check that all services are running correctly and the Delta Lake integration is functional.
# Check service status
sudo systemctl status minio spark-master spark-worker
Verify Spark cluster
curl -s http://localhost:8080 | grep -i "spark master"
Check MinIO buckets
mc ls local/delta-lake
Test Spark shell with Delta Lake
/opt/spark/bin/spark-shell --packages io.delta:delta-core_2.12:2.4.0 --conf "spark.sql.extensions=io.delta.sql.DeltaSparkSessionExtension" --conf "spark.sql.catalog.spark_catalog=org.apache.spark.sql.delta.catalog.DeltaCatalog"
Configure ACID transaction settings
Optimize Delta Lake performance settings
Configure advanced Delta Lake settings for better performance and transaction isolation.
# Delta Lake Performance Settings
spark.databricks.delta.retentionDurationCheck.enabled=false
spark.databricks.delta.vacuum.parallelDelete.enabled=true
spark.databricks.delta.merge.repartitionBeforeWrite.enabled=true
spark.databricks.delta.optimizeWrite.enabled=true
spark.databricks.delta.autoCompact.enabled=true
Transaction Isolation
spark.databricks.delta.properties.defaults.isolation.level=WriteSerializable
spark.databricks.delta.properties.defaults.checkpointInterval=10
Create production Delta Lake table with schema evolution
Demonstrate advanced Delta Lake features including schema evolution and time travel capabilities.
from pyspark.sql import SparkSession
from pyspark.sql.types import *
from delta import *
import datetime
spark = SparkSession.builder \
.appName("ProductionDeltaExample") \
.master("spark://localhost:7077") \
.config("spark.sql.extensions", "io.delta.sql.DeltaSparkSessionExtension") \
.config("spark.sql.catalog.spark_catalog", "org.apache.spark.sql.delta.catalog.DeltaCatalog") \
.getOrCreate()
Create initial schema
schema = StructType([
StructField("transaction_id", StringType(), False),
StructField("customer_id", IntegerType(), False),
StructField("amount", DoubleType(), False),
StructField("timestamp", TimestampType(), False)
])
Generate sample transaction data
from datetime import datetime, timedelta
import random
transactions = []
for i in range(1000):
transactions.append((
f"txn_{i:05d}",
random.randint(1, 100),
round(random.uniform(10.0, 500.0), 2),
datetime.now() - timedelta(days=random.randint(0, 30))
))
df = spark.createDataFrame(transactions, schema)
Write with partitioning for better performance
print("Creating partitioned Delta table...")
df.write.format("delta") \
.mode("overwrite") \
.partitionBy("customer_id") \
.save("s3a://delta-lake/transactions")
Demonstrate time travel
print("\nTable versions:")
delta_table = DeltaTable.forPath(spark, "s3a://delta-lake/transactions")
delta_table.history().select("version", "timestamp", "operation").show()
Schema evolution - add new column
print("\nAdding new column (schema evolution)...")
new_data = [("txn_99999", 101, 299.99, datetime.now(), "credit_card")]
new_columns = ["transaction_id", "customer_id", "amount", "timestamp", "payment_method"]
new_df = spark.createDataFrame(new_data, new_columns)
new_df.write.format("delta") \
.mode("append") \
.option("mergeSchema", "true") \
.save("s3a://delta-lake/transactions")
print("\nSchema after evolution:")
spark.read.format("delta").load("s3a://delta-lake/transactions").printSchema()
spark.stop()
Common issues
| Symptom | Cause | Fix |
|---|---|---|
| Spark fails to start | Java not found or wrong version | Verify JAVA_HOME: echo $JAVA_HOME && java -version |
| Delta Lake JARs not found | JAR files missing or wrong location | Check JARs in /opt/spark/jars: ls -la /opt/spark/jars/delta |
| MinIO connection refused | MinIO service not running | Restart MinIO: sudo systemctl restart minio |
| S3A filesystem errors | Wrong endpoint or credentials | Verify MinIO config: mc admin info local |
| Permission denied on logs | Incorrect directory ownership | Fix ownership: sudo chown -R spark:spark /opt/spark |
| Worker not connecting to master | Firewall blocking ports | Check port 7077: netstat -tlnp | grep 7077 |
Next steps
- Configure MinIO with Apache Spark 3.5 for big data analytics and object storage integration
- Set up Spark Streaming with Kafka and Delta Lake for real-time analytics
- Implement Spark SQL performance optimization with catalyst optimizer
- Configure Spark on Kubernetes with cluster autoscaling for dynamic workloads
Automated install script
Run this to automate the entire setup
#!/usr/bin/env bash
set -euo pipefail
# Colors for output
RED='\033[0;31m'
GREEN='\033[0;32m'
YELLOW='\033[1;33m'
NC='\033[0m' # No Color
# Configuration
SPARK_VERSION="3.5.0"
DELTA_VERSION="2.4.0"
HADOOP_AWS_VERSION="3.3.4"
AWS_SDK_VERSION="1.12.367"
MINIO_USER="minio"
MINIO_ACCESS_KEY="${MINIO_ACCESS_KEY:-minioadmin}"
MINIO_SECRET_KEY="${MINIO_SECRET_KEY:-minioadmin}"
MINIO_DATA_DIR="/opt/minio/data"
# Error handling
cleanup() {
echo -e "${RED}[ERROR]${NC} Installation failed. Cleaning up..."
systemctl stop minio 2>/dev/null || true
systemctl stop spark-master 2>/dev/null || true
systemctl stop spark-worker 2>/dev/null || true
userdel -r spark 2>/dev/null || true
userdel -r minio 2>/dev/null || true
rm -rf /opt/spark /opt/minio 2>/dev/null || true
}
trap cleanup ERR
print_usage() {
echo "Usage: $0 [OPTIONS]"
echo "Options:"
echo " --minio-access-key KEY MinIO access key (default: minioadmin)"
echo " --minio-secret-key KEY MinIO secret key (default: minioadmin)"
echo " -h, --help Show this help message"
}
# Parse arguments
while [[ $# -gt 0 ]]; do
case $1 in
--minio-access-key)
MINIO_ACCESS_KEY="$2"
shift 2
;;
--minio-secret-key)
MINIO_SECRET_KEY="$2"
shift 2
;;
-h|--help)
print_usage
exit 0
;;
*)
echo -e "${RED}Unknown option: $1${NC}"
print_usage
exit 1
;;
esac
done
# Check if running as root
if [[ $EUID -ne 0 ]]; then
echo -e "${RED}[ERROR]${NC} This script must be run as root"
exit 1
fi
# Detect distro
if [ -f /etc/os-release ]; then
. /etc/os-release
case "$ID" in
ubuntu|debian)
PKG_MGR="apt"
PKG_UPDATE="apt update && apt upgrade -y"
PKG_INSTALL="apt install -y"
JAVA_HOME_PATH="/usr/lib/jvm/java-17-openjdk-amd64"
;;
almalinux|rocky|centos|rhel|ol|fedora)
PKG_MGR="dnf"
PKG_UPDATE="dnf update -y"
PKG_INSTALL="dnf install -y"
JAVA_HOME_PATH="/usr/lib/jvm/java-17-openjdk"
;;
amzn)
PKG_MGR="yum"
PKG_UPDATE="yum update -y"
PKG_INSTALL="yum install -y"
JAVA_HOME_PATH="/usr/lib/jvm/java-17-openjdk"
;;
*)
echo -e "${RED}[ERROR]${NC} Unsupported distro: $ID"
exit 1
;;
esac
else
echo -e "${RED}[ERROR]${NC} Cannot detect OS distribution"
exit 1
fi
echo -e "${GREEN}[1/10]${NC} Updating system packages and installing prerequisites..."
$PKG_UPDATE
if [[ "$PKG_MGR" == "apt" ]]; then
$PKG_INSTALL openjdk-17-jdk wget curl unzip python3 python3-pip tar
else
$PKG_INSTALL java-17-openjdk java-17-openjdk-devel wget curl unzip python3 python3-pip tar
fi
echo -e "${GREEN}[2/10]${NC} Configuring Java environment..."
cat >> /etc/environment << EOF
JAVA_HOME=$JAVA_HOME_PATH
PATH=\$JAVA_HOME/bin:\$PATH
EOF
export JAVA_HOME="$JAVA_HOME_PATH"
export PATH="$JAVA_HOME/bin:$PATH"
echo -e "${GREEN}[3/10]${NC} Creating Spark user and directories..."
useradd -m -s /bin/bash spark || true
mkdir -p /opt/spark/{logs,work,conf}
chown -R spark:spark /opt/spark
echo -e "${GREEN}[4/10]${NC} Downloading and installing Apache Spark 3.5..."
cd /tmp
wget -q "https://archive.apache.org/dist/spark/spark-${SPARK_VERSION}/spark-${SPARK_VERSION}-bin-hadoop3.tgz"
tar -xzf "spark-${SPARK_VERSION}-bin-hadoop3.tgz"
cp -r "spark-${SPARK_VERSION}-bin-hadoop3"/* /opt/spark/
chown -R spark:spark /opt/spark
chmod 755 /opt/spark/bin/*
echo -e "${GREEN}[5/10]${NC} Configuring Spark environment..."
cat > /opt/spark/conf/spark-env.sh << EOF
#!/usr/bin/env bash
export JAVA_HOME=$JAVA_HOME_PATH
export SPARK_HOME=/opt/spark
export SPARK_CONF_DIR=/opt/spark/conf
export SPARK_LOG_DIR=/opt/spark/logs
export SPARK_WORKER_DIR=/opt/spark/work
export PYSPARK_PYTHON=python3
export SPARK_MASTER_HOST=localhost
export SPARK_MASTER_PORT=7077
export SPARK_MASTER_WEBUI_PORT=8080
export SPARK_WORKER_WEBUI_PORT=8081
EOF
cat > /opt/spark/conf/spark-defaults.conf << EOF
# Delta Lake Configuration
spark.sql.extensions=io.delta.sql.DeltaSparkSessionExtension
spark.sql.catalog.spark_catalog=org.apache.spark.sql.delta.catalog.DeltaCatalog
# S3/MinIO Configuration
spark.hadoop.fs.s3a.endpoint=http://localhost:9000
spark.hadoop.fs.s3a.access.key=$MINIO_ACCESS_KEY
spark.hadoop.fs.s3a.secret.key=$MINIO_SECRET_KEY
spark.hadoop.fs.s3a.path.style.access=true
spark.hadoop.fs.s3a.connection.ssl.enabled=false
spark.hadoop.fs.s3a.impl=org.apache.hadoop.fs.s3a.S3AFileSystem
spark.hadoop.fs.s3a.aws.credentials.provider=org.apache.hadoop.fs.s3a.SimpleAWSCredentialsProvider
# Performance Optimization
spark.serializer=org.apache.spark.serializer.KryoSerializer
spark.sql.adaptive.enabled=true
spark.sql.adaptive.coalescePartitions.enabled=true
spark.executor.memory=2g
spark.driver.memory=1g
spark.executor.cores=2
spark.default.parallelism=8
EOF
chown -R spark:spark /opt/spark/conf
chmod 644 /opt/spark/conf/*
chmod 755 /opt/spark/conf/spark-env.sh
echo -e "${GREEN}[6/10]${NC} Downloading Delta Lake and AWS SDK JARs..."
cd /opt/spark/jars
wget -q "https://repo1.maven.org/maven2/io/delta/delta-core_2.12/${DELTA_VERSION}/delta-core_2.12-${DELTA_VERSION}.jar"
wget -q "https://repo1.maven.org/maven2/io/delta/delta-storage/${DELTA_VERSION}/delta-storage-${DELTA_VERSION}.jar"
wget -q "https://repo1.maven.org/maven2/org/apache/hadoop/hadoop-aws/${HADOOP_AWS_VERSION}/hadoop-aws-${HADOOP_AWS_VERSION}.jar"
wget -q "https://repo1.maven.org/maven2/com/amazonaws/aws-java-sdk-bundle/${AWS_SDK_VERSION}/aws-java-sdk-bundle-${AWS_SDK_VERSION}.jar"
chown spark:spark *.jar
echo -e "${GREEN}[7/10]${NC} Setting up MinIO..."
useradd -r -s /bin/false -d /opt/minio minio || true
mkdir -p /opt/minio/bin "$MINIO_DATA_DIR"
cd /opt/minio/bin
wget -q https://dl.min.io/server/minio/release/linux-amd64/minio
chmod 755 minio
chown -R minio:minio /opt/minio
echo -e "${GREEN}[8/10]${NC} Creating systemd services..."
cat > /etc/systemd/system/minio.service << EOF
[Unit]
Description=MinIO
Documentation=https://min.io/docs/minio/linux/index.html
Wants=network-online.target
After=network-online.target
[Service]
WorkingDirectory=/opt/minio
User=minio
Group=minio
ProtectProc=invisible
Environment=MINIO_ROOT_USER=$MINIO_ACCESS_KEY
Environment=MINIO_ROOT_PASSWORD=$MINIO_SECRET_KEY
ExecStart=/opt/minio/bin/minio server $MINIO_DATA_DIR --console-address ":9001"
Restart=always
LimitNOFILE=65536
TasksMax=infinity
TimeoutStopSec=infinity
SendSIGKILL=no
[Install]
WantedBy=multi-user.target
EOF
cat > /etc/systemd/system/spark-master.service << EOF
[Unit]
Description=Apache Spark Master
After=network.target
[Service]
Type=forking
User=spark
Group=spark
WorkingDirectory=/opt/spark
Environment=SPARK_HOME=/opt/spark
Environment=JAVA_HOME=$JAVA_HOME_PATH
ExecStart=/opt/spark/sbin/start-master.sh
ExecStop=/opt/spark/sbin/stop-master.sh
Restart=always
[Install]
WantedBy=multi-user.target
EOF
cat > /etc/systemd/system/spark-worker.service << EOF
[Unit]
Description=Apache Spark Worker
After=spark-master.service
[Service]
Type=forking
User=spark
Group=spark
WorkingDirectory=/opt/spark
Environment=SPARK_HOME=/opt/spark
Environment=JAVA_HOME=$JAVA_HOME_PATH
ExecStart=/opt/spark/sbin/start-worker.sh spark://localhost:7077
ExecStop=/opt/spark/sbin/stop-worker.sh
Restart=always
[Install]
WantedBy=multi-user.target
EOF
echo -e "${GREEN}[9/10]${NC} Starting services..."
systemctl daemon-reload
systemctl enable minio spark-master spark-worker
systemctl start minio
sleep 5
systemctl start spark-master
sleep 5
systemctl start spark-worker
echo -e "${GREEN}[10/10]${NC} Verifying installation..."
if systemctl is-active --quiet minio; then
echo -e "${GREEN}✓${NC} MinIO is running"
else
echo -e "${RED}✗${NC} MinIO failed to start"
exit 1
fi
if systemctl is-active --quiet spark-master; then
echo -e "${GREEN}✓${NC} Spark Master is running"
else
echo -e "${RED}✗${NC} Spark Master failed to start"
exit 1
fi
if systemctl is-active --quiet spark-worker; then
echo -e "${GREEN}✓${NC} Spark Worker is running"
else
echo -e "${RED}✗${NC} Spark Worker failed to start"
exit 1
fi
echo -e "${GREEN}Installation completed successfully!${NC}"
echo
echo "Access points:"
echo " - MinIO Console: http://localhost:9001"
echo " - MinIO API: http://localhost:9000"
echo " - Spark Master UI: http://localhost:8080"
echo " - Spark Worker UI: http://localhost:8081"
echo
echo "Credentials:"
echo " - MinIO Access Key: $MINIO_ACCESS_KEY"
echo " - MinIO Secret Key: $MINIO_SECRET_KEY"
Review the script before running. Execute with: bash install.sh