Advanced Nomad Job Templates and Deployment Strategies

Master production-grade Nomad job templates with HCL syntax, implement rolling updates with health checks, and deploy advanced blue-green and canary deployment patterns for resilient containerized workloads.

Prerequisites

Root or sudo access
At least 8GB RAM per node
3+ node cluster recommended
Consul service discovery
Docker runtime

What this solves

Nomad job templates provide declarative infrastructure for container orchestration, but basic deployments lack the sophistication needed for production environments. This tutorial covers advanced HCL templating, rolling deployment strategies with health checks, blue-green deployments, and canary release patterns that ensure zero-downtime updates and automatic rollbacks.

Prerequisites and setup

Install Nomad cluster

Set up a multi-node Nomad cluster with Consul integration for service discovery and coordination.

wget -O- https://apt.releases.hashicorp.com/gpg | sudo gpg --dearmor -o /usr/share/keyrings/hashicorp-archive-keyring.gpg
echo "deb [signed-by=/usr/share/keyrings/hashicorp-archive-keyring.gpg] https://apt.releases.hashicorp.com $(lsb_release -cs) main" | sudo tee /etc/apt/sources.list.d/hashicorp.list
sudo apt update && sudo apt install -y nomad consul

sudo dnf install -y dnf-plugins-core
sudo dnf config-manager --add-repo https://rpm.releases.hashicorp.com/RHEL/hashicorp.repo
sudo dnf install -y nomad consul

Configure Consul server

Set up Consul for service discovery and distributed coordination between Nomad nodes.

datacenter = "dc1"
data_dir = "/opt/consul"
log_level = "INFO"
server = true
bootstrap_expect = 3
retry_join = ["10.0.1.10", "10.0.1.11", "10.0.1.12"]
bind_addr = "{{ GetInterfaceIP \"eth0\" }}"
client_addr = "0.0.0.0"
ui_config {
  enabled = true
}
connect {
  enabled = true
}

Configure Nomad server

Configure Nomad server nodes with Consul integration and cluster coordination.

datacenter = "dc1"
data_dir = "/opt/nomad/data"
bind_addr = "0.0.0.0"

server {
  enabled = true
  bootstrap_expect = 3
  server_join {
    retry_join = ["10.0.1.10", "10.0.1.11", "10.0.1.12"]
  }
}

consul {
  address = "127.0.0.1:8500"
  server_service_name = "nomad"
  client_service_name = "nomad-client"
  auto_advertise = true
  server_auto_join = true
  client_auto_join = true
}

ui_config {
  enabled = true
}

Start services

Enable and start Consul and Nomad services on all cluster nodes.

sudo systemctl enable --now consul
sudo systemctl enable --now nomad
sudo systemctl status consul nomad

Advanced HCL job templates

Create parameterized job template

Build a flexible job template using HCL variables and templating for reusable deployment patterns.

variable "app_name" {
  description = "Application name for service registration"
  type        = string
  default     = "web-app"
}

variable "app_version" {
  description = "Application version tag"
  type        = string
  default     = "latest"
}

variable "instance_count" {
  description = "Number of application instances"
  type        = number
  default     = 3
}

variable "resource_cpu" {
  description = "CPU allocation in MHz"
  type        = number
  default     = 500
}

variable "resource_memory" {
  description = "Memory allocation in MB"
  type        = number
  default     = 512
}

job "${var.app_name}" {
  datacenters = ["dc1"]
  type        = "service"
  priority    = 50

  constraint {
    attribute = "${attr.kernel.name}"
    value     = "linux"
  }

  constraint {
    attribute = "${meta.instance_type}"
    operator  = "regexp"
    value     = "(web|app)"
  }

  update {
    max_parallel      = 2
    health_check      = "checks"
    min_healthy_time  = "30s"
    healthy_deadline  = "5m"
    progress_deadline = "10m"
    auto_revert       = true
    canary           = 1
    stagger          = "30s"
  }

  group "${var.app_name}-group" {
    count = var.instance_count

    network {
      port "http" {
        static = 8080
      }
    }

    service {
      name = "${var.app_name}"
      port = "http"
      tags = [
        "version-${var.app_version}",
        "traefik.enable=true",
        "traefik.http.routers.${var.app_name}.rule=Host(${var.app_name}.example.com)"
      ]

      check {
        type     = "http"
        path     = "/health"
        interval = "10s"
        timeout  = "3s"
      }

      check {
        type     = "tcp"
        interval = "10s"
        timeout  = "3s"
      }
    }

    restart {
      attempts = 3
      interval = "5m"
      delay    = "15s"
      mode     = "fail"
    }

    task "${var.app_name}-task" {
      driver = "docker"

      config {
        image = "nginx:${var.app_version}"
        ports = ["http"]
        
        mount {
          type   = "bind"
          source = "local/nginx.conf"
          target = "/etc/nginx/nginx.conf"
        }
      }

      template {
        data = <<-EOH
        worker_processes auto;
        events {
            worker_connections 1024;
        }
        http {
            server {
                listen 8080;
                location /health {
                    access_log off;
                    return 200 "healthy";
                    add_header Content-Type text/plain;
                }
                location / {
                    root /usr/share/nginx/html;
                    index index.html;
                }
            }
        }
        EOH
        destination = "local/nginx.conf"
      }

      resources {
        cpu    = var.resource_cpu
        memory = var.resource_memory
      }

      env {
        APP_VERSION = "${var.app_version}"
        DATACENTER  = "${attr.datacenter}"
        NODE_NAME   = "${attr.unique.hostname}"
      }

      logs {
        max_files     = 10
        max_file_size = 15
      }
    }
  }
}

Deploy with variables

Submit the job with custom variable values for different environments and configurations.

nomad job run \
  -var="app_name=frontend" \
  -var="app_version=v2.1.0" \
  -var="instance_count=5" \
  -var="resource_cpu=800" \
  -var="resource_memory=1024" \
  web-app-template.nomad

Rolling deployment strategies

Configure advanced update blocks

Implement sophisticated rolling update strategies with health checks and automatic rollback capabilities.

job "rolling-app" {
  datacenters = ["dc1"]
  type        = "service"

  update {
    max_parallel      = 2
    health_check      = "checks"
    min_healthy_time  = "30s"
    healthy_deadline  = "5m"
    progress_deadline = "10m"
    auto_revert       = true
    auto_promote      = false
    canary           = 2
    stagger          = "30s"
  }

  group "app" {
    count = 6

    network {
      port "http" {}
    }

    service {
      name = "rolling-app"
      port = "http"
      tags = ["version-${NOMAD_META_version}"]

      check {
        name     = "HTTP Health"
        type     = "http"
        path     = "/health"
        interval = "10s"
        timeout  = "3s"
        check_restart {
          limit           = 3
          grace           = "10s"
          ignore_warnings = false
        }
      }

      check {
        name     = "Application Ready"
        type     = "script"
        command  = "/bin/sh"
        args     = ["-c", "curl -f http://localhost:${NOMAD_PORT_http}/ready"]
        interval = "15s"
        timeout  = "5s"
      }
    }

    task "web" {
      driver = "docker"
      
      config {
        image = "nginx:latest"
        ports = ["http"]
      }

      resources {
        cpu    = 500
        memory = 512
      }

      kill_timeout = "30s"
      kill_signal  = "SIGTERM"

      shutdown_delay = "5s"
    }
  }
}

Monitor rolling deployment

Track deployment progress and health status during rolling updates.

nomad job run rolling-update.nomad
nomad job status rolling-app
nomad deployment status rolling-app
nomad deployment promote rolling-app

Blue-green deployment pattern

Create blue-green job template

Implement blue-green deployments using Nomad job versioning and service discovery integration.

variable "environment" {
  description = "Deployment environment (blue or green)"
  type        = string
  default     = "blue"
}

variable "app_version" {
  description = "Application version"
  type        = string
}

job "app-${var.environment}" {
  datacenters = ["dc1"]
  type        = "service"

  meta {
    environment = "${var.environment}"
    version     = "${var.app_version}"
  }

  group "app" {
    count = 3

    network {
      port "http" {}
    }

    service {
      name = "app-${var.environment}"
      port = "http"
      tags = [
        "environment-${var.environment}",
        "version-${var.app_version}",
        "traefik.enable=false"
      ]

      meta {
        environment = "${var.environment}"
        version     = "${var.app_version}"
      }

      check {
        type     = "http"
        path     = "/health"
        interval = "10s"
        timeout  = "3s"
      }

      check {
        name     = "Deep Health Check"
        type     = "http"
        path     = "/health/deep"
        interval = "30s"
        timeout  = "10s"
      }
    }

    task "app" {
      driver = "docker"

      config {
        image = "myapp:${var.app_version}"
        ports = ["http"]
      }

      env {
        ENVIRONMENT = "${var.environment}"
        VERSION     = "${var.app_version}"
      }

      resources {
        cpu    = 1000
        memory = 1024
      }
    }
  }
}

Load balancer configuration job
job "app-router" {
  datacenters = ["dc1"]
  type        = "service"

  group "router" {
    count = 1

    network {
      port "http" {
        static = 80
      }
    }

    service {
      name = "app-router"
      port = "http"
      tags = [
        "traefik.enable=true",
        "traefik.http.routers.app.rule=Host(app.example.com)"
      ]
    }

    task "traefik" {
      driver = "docker"

      config {
        image = "traefik:v2.10"
        ports = ["http"]
        args = [
          "--api.dashboard=true",
          "--providers.consul.endpoints=127.0.0.1:8500",
          "--providers.consul.exposedByDefault=false",
          "--entrypoints.web.address=:80"
        ]
      }

      resources {
        cpu    = 200
        memory = 256
      }
    }
  }
}

Execute blue-green deployment

Deploy to the inactive environment and switch traffic after validation.

# Deploy green environment
nomad job run -var="environment=green" -var="app_version=v2.0.0" blue-green.nomad

Verify green deployment
nomad job status app-green
consul catalog services

Switch traffic by updating service tags
nomad job run -var="environment=green" -var="app_version=v2.0.0" blue-green.nomad

Stop blue environment after validation
nomad job stop app-blue

Canary deployment implementation

Advanced canary configuration

Implement sophisticated canary deployments with traffic splitting and automatic promotion based on metrics.

job "canary-app" {
  datacenters = ["dc1"]
  type        = "service"

  update {
    max_parallel      = 1
    health_check      = "checks"
    min_healthy_time  = "1m"
    healthy_deadline  = "10m"
    progress_deadline = "15m"
    auto_revert       = true
    auto_promote      = false
    canary           = 2
    stagger          = "1m"
  }

  group "app" {
    count = 6

    network {
      port "http" {}
      port "metrics" {}
    }

    service {
      name = "canary-app"
      port = "http"
      tags = [
        "version-${NOMAD_META_version}",
        "traefik.enable=true",
        "traefik.http.routers.canary-app.rule=Host(app.example.com)",
        "traefik.http.services.canary-app.loadbalancer.healthcheck.path=/health"
      ]

      canary_tags = [
        "version-${NOMAD_META_version}",
        "canary",
        "traefik.enable=true",
        "traefik.http.routers.canary-app-canary.rule=Host(app.example.com) && Headers(X-Canary, true)",
        "traefik.http.middlewares.canary-split.weight.service1=canary-app@consul,90",
        "traefik.http.middlewares.canary-split.weight.service2=canary-app-canary@consul,10"
      ]

      check {
        type     = "http"
        path     = "/health"
        interval = "10s"
        timeout  = "3s"
      }

      check {
        name     = "Readiness Check"
        type     = "http"
        path     = "/ready"
        interval = "15s"
        timeout  = "5s"
      }

      check {
        name     = "Error Rate Check"
        type     = "script"
        command  = "/bin/sh"
        args = [
          "-c",
          "curl -s http://localhost:${NOMAD_PORT_metrics}/metrics | grep 'error_rate' | awk '{print $2}' | awk '$1 < 0.05 {exit 0} {exit 1}'"
        ]
        interval = "30s"
        timeout  = "10s"
      }
    }

    task "app" {
      driver = "docker"

      config {
        image = "myapp:latest"
        ports = ["http", "metrics"]
      }

      template {
        data = <<-EOH
        #!/bin/bash
        # Automated canary promotion script
        set -e
        
        DEPLOYMENT_ID=$(nomad deployment list -json | jq -r '.[0].ID')
        
        # Wait for canary instances to be healthy
        echo "Waiting for canary instances..."
        timeout 300 bash -c 'until [ $(nomad deployment status $DEPLOYMENT_ID -json | jq -r ".TaskGroups.app.HealthyAllocs") -ge 2 ]; do sleep 10; done'
        
        # Monitor metrics for 5 minutes
        echo "Monitoring canary metrics..."
        for i in {1..30}; do
          ERROR_RATE=$(curl -s http://localhost:${NOMAD_PORT_metrics}/metrics | grep 'error_rate' | awk '{print $2}')
          RESPONSE_TIME=$(curl -s http://localhost:${NOMAD_PORT_metrics}/metrics | grep 'response_time_p95' | awk '{print $2}')
          
          if (( $(echo "$ERROR_RATE > 0.05" | bc -l) )) || (( $(echo "$RESPONSE_TIME > 1000" | bc -l) )); then
            echo "Canary metrics failed threshold, reverting deployment"
            nomad deployment fail $DEPLOYMENT_ID
            exit 1
          fi
          
          sleep 10
        done
        
        # Promote if all checks pass
        echo "Canary validation successful, promoting deployment"
        nomad deployment promote $DEPLOYMENT_ID
        EOH
        destination = "local/canary-check.sh"
        perms       = "755"
      }

      resources {
        cpu    = 500
        memory = 512
      }

      env {
        ENABLE_METRICS = "true"
        METRICS_PORT   = "${NOMAD_PORT_metrics}"
      }
    }

    task "canary-monitor" {
      driver = "exec"
      
      config {
        command = "/bin/bash"
        args    = ["local/canary-check.sh"]
      }
      
      lifecycle {
        hook    = "poststart"
        sidecar = false
      }

      resources {
        cpu    = 100
        memory = 128
      }
    }
  }
}

Deploy and monitor canary

Execute canary deployment with automated monitoring and promotion logic.

# Deploy canary version
nomad job run canary-deployment.nomad

Monitor deployment progress
nomad deployment status canary-app
watch -n 5 'nomad deployment status canary-app'

Manual promotion if needed
nomad deployment promote canary-app

Manual rollback if issues detected
nomad deployment fail canary-app

Advanced scheduling and constraints

Implement complex affinity rules

Configure advanced scheduling constraints using node attributes, metadata, and affinity rules for optimal placement.

job "distributed-app" {
  datacenters = ["dc1"]
  type        = "service"

  # Spread across availability zones
  constraint {
    attribute = "${meta.availability_zone}"
    operator  = "distinct_hosts"
    value     = "true"
  }

  # Require SSD storage
  constraint {
    attribute = "${meta.storage_type}"
    value     = "ssd"
  }

  # Avoid spot instances for critical workloads
  constraint {
    attribute = "${meta.instance_lifecycle}"
    operator  = "!="
    value     = "spot"
  }

  affinity {
    attribute = "${meta.instance_type}"
    value     = "compute-optimized"
    weight    = 80
  }

  affinity {
    attribute = "${node.class}"
    value     = "production"
    weight    = 100
  }

  # Anti-affinity to spread across nodes
  spread {
    attribute = "${node.unique.id}"
    weight    = 100
  }

  # Spread across availability zones
  spread {
    attribute = "${meta.availability_zone}"
    weight    = 80
    target "us-east-1a" {
      percent = 34
    }
    target "us-east-1b" {
      percent = 33
    }
    target "us-east-1c" {
      percent = 33
    }
  }

  group "web" {
    count = 9

    # Group-level constraints
    constraint {
      attribute = "${attr.cpu.arch}"
      value     = "amd64"
    }

    # Ensure minimum resources available
    constraint {
      attribute = "${attr.memory.totalbytes}"
      operator  = ">="
      value     = "8589934592" # 8GB
    }

    network {
      port "http" {}
    }

    service {
      name = "distributed-web"
      port = "http"
      tags = [
        "zone-${meta.availability_zone}",
        "instance-${meta.instance_type}"
      ]

      check {
        type     = "http"
        path     = "/health"
        interval = "10s"
        timeout  = "3s"
      }
    }

    task "web" {
      driver = "docker"

      config {
        image = "nginx:latest"
        ports = ["http"]
      }

      resources {
        cpu    = 1000
        memory = 2048
        
        device "nvidia/gpu" {
          count = 1
          
          constraint {
            attribute = "${device.attr.memory}"
            operator  = ">="
            value     = "8GiB"
          }
        }
      }

      env {
        AVAILABILITY_ZONE = "${meta.availability_zone}"
        INSTANCE_TYPE     = "${meta.instance_type}"
        NODE_ID           = "${node.unique.id}"
      }
    }
  }
}

Monitoring and integration

Configure deployment monitoring

Set up comprehensive monitoring for deployments with Consul and external monitoring systems. For detailed monitoring setup, see our guide on monitoring Consul with Prometheus and Grafana.

#!/bin/bash
Deployment monitoring script

set -e

JOB_NAME="$1"
DEPLOYMENT_ID=$(nomad deployment list -json -job="$JOB_NAME" | jq -r '.[0].ID')

echo "Monitoring deployment: $DEPLOYMENT_ID"

while true; do
    STATUS=$(nomad deployment status "$DEPLOYMENT_ID" -json | jq -r '.Status')
    HEALTHY=$(nomad deployment status "$DEPLOYMENT_ID" -json | jq -r '.TaskGroups | to_entries[] | .value.HealthyAllocs')
    DESIRED=$(nomad deployment status "$DEPLOYMENT_ID" -json | jq -r '.TaskGroups | to_entries[] | .value.DesiredTotal')
    
    echo "Status: $STATUS, Healthy: $HEALTHY/$DESIRED"
    
    if [ "$STATUS" = "successful" ]; then
        echo "Deployment completed successfully"
        exit 0
    elif [ "$STATUS" = "failed" ]; then
        echo "Deployment failed"
        exit 1
    fi
    
    sleep 10
done

Set up automated testing

Implement automated testing pipeline for deployment validation and health verification.

#!/bin/bash
Automated deployment testing

set -e

SERVICE_NAME="$1"
TEST_URL="$2"

echo "Testing deployment for service: $SERVICE_NAME"

Wait for service registration
echo "Waiting for service registration..."
timeout 120 bash -c "until consul catalog services | grep -q $SERVICE_NAME; do sleep 5; done"

Get service endpoints
SERVICE_IPS=$(consul catalog service "$SERVICE_NAME" -format="json" | jq -r '.[].ServiceAddress')

for IP in $SERVICE_IPS; do
    echo "Testing endpoint: http://$IP:8080"
    
    # Health check
    curl -f "http://$IP:8080/health" || exit 1
    
    # Load test
    ab -n 100 -c 10 "http://$IP:8080$TEST_URL" || exit 1
    
    # Response time check
    RESPONSE_TIME=$(curl -o /dev/null -s -w '%{time_total}' "http://$IP:8080$TEST_URL")
    if (( $(echo "$RESPONSE_TIME > 1.0" | bc -l) )); then
        echo "Response time too high: $RESPONSE_TIME seconds"
        exit 1
    fi
done

echo "All tests passed for $SERVICE_NAME"

Verify your setup

# Check Nomad cluster status
nomad server members
nomad node status

Verify job deployments
nomad job status
nomad deployment list

Check service discovery
consul catalog services
consul members

Test service endpoints
curl -I http://app.example.com/health

Monitor resource usage
nomad monitor -log-level=INFO

Common issues

Symptom	Cause	Fix
Deployment stuck in progress	Health checks failing	Check service logs with `nomad alloc logs [ALLOC_ID]` and verify health check endpoints
Canary not auto-promoting	Metrics threshold not met	Review canary metrics and adjust thresholds in job specification
Service not registering in Consul	Consul agent connectivity	Verify Consul agent is running and accessible: `consul members`
Scheduling constraints not working	Node metadata missing	Add required metadata to client nodes: `meta { instance_type = "web" }`
Rolling update failures	Insufficient healthy instances	Increase `min_healthy_time` and verify resource availability
Blue-green switch not working	Load balancer configuration	Update service tags and verify Traefik or load balancer rules

Next steps

Automated install script

Run this to automate the entire setup

install.sh

#!/usr/bin/env bash
set -euo pipefail

# Colors for output
RED='\033[0;31m'
GREEN='\033[0;32m'
YELLOW='\033[1;33m'
NC='\033[0m'

# Default values
CONSUL_IPS="${1:-10.0.1.10,10.0.1.11,10.0.1.12}"
NODE_IP="${2:-$(hostname -I | awk '{print $1}')}"
DATACENTER="${3:-dc1}"

# Usage message
usage() {
    echo "Usage: $0 [consul_ips] [node_ip] [datacenter]"
    echo "Example: $0 '10.0.1.10,10.0.1.11,10.0.1.12' '10.0.1.10' 'dc1'"
    exit 1
}

# Error handling and cleanup
cleanup() {
    echo -e "${RED}[ERROR] Installation failed. Check logs above.${NC}"
    systemctl stop consul nomad 2>/dev/null || true
}
trap cleanup ERR

# Check prerequisites
check_prereqs() {
    echo -e "${YELLOW}[1/10] Checking prerequisites...${NC}"
    
    if [[ $EUID -ne 0 ]]; then
        echo -e "${RED}This script must be run as root${NC}"
        exit 1
    fi
    
    if ! command -v wget &> /dev/null; then
        echo -e "${RED}wget is required but not installed${NC}"
        exit 1
    fi
}

# Detect distribution
detect_distro() {
    echo -e "${YELLOW}[2/10] Detecting distribution...${NC}"
    
    if [ -f /etc/os-release ]; then
        . /etc/os-release
        case "$ID" in
            ubuntu|debian) 
                PKG_MGR="apt"
                PKG_INSTALL="apt install -y"
                PKG_UPDATE="apt update"
                FIREWALL_CMD="ufw"
                ;;
            almalinux|rocky|centos|rhel|ol|fedora) 
                PKG_MGR="dnf"
                PKG_INSTALL="dnf install -y"
                PKG_UPDATE="dnf check-update || true"
                FIREWALL_CMD="firewall-cmd"
                ;;
            amzn) 
                PKG_MGR="yum"
                PKG_INSTALL="yum install -y"
                PKG_UPDATE="yum check-update || true"
                FIREWALL_CMD="firewall-cmd"
                ;;
            *) 
                echo -e "${RED}Unsupported distribution: $ID${NC}"
                exit 1
                ;;
        esac
        echo -e "${GREEN}Detected: $PRETTY_NAME${NC}"
    else
        echo -e "${RED}Cannot detect distribution${NC}"
        exit 1
    fi
}

# Install HashiCorp repository
install_hashicorp_repo() {
    echo -e "${YELLOW}[3/10] Installing HashiCorp repository...${NC}"
    
    case "$PKG_MGR" in
        apt)
            wget -O- https://apt.releases.hashicorp.com/gpg | gpg --dearmor > /usr/share/keyrings/hashicorp-archive-keyring.gpg
            echo "deb [signed-by=/usr/share/keyrings/hashicorp-archive-keyring.gpg] https://apt.releases.hashicorp.com $(lsb_release -cs) main" > /etc/apt/sources.list.d/hashicorp.list
            $PKG_UPDATE
            ;;
        dnf|yum)
            $PKG_INSTALL dnf-plugins-core 2>/dev/null || $PKG_INSTALL yum-utils
            cat > /etc/yum.repos.d/hashicorp.repo << 'EOF'
[hashicorp]
name=Hashicorp Stable - $basearch
baseurl=https://rpm.releases.hashicorp.com/$releasever/$basearch/stable
enabled=1
gpgcheck=1
gpgkey=https://rpm.releases.hashicorp.com/gpg
EOF
            ;;
    esac
}

# Install Nomad and Consul
install_services() {
    echo -e "${YELLOW}[4/10] Installing Nomad and Consul...${NC}"
    $PKG_INSTALL nomad consul
}

# Create directories
create_directories() {
    echo -e "${YELLOW}[5/10] Creating service directories...${NC}"
    
    mkdir -p /opt/consul /opt/nomad/data /etc/consul.d /etc/nomad.d
    chown consul:consul /opt/consul /etc/consul.d
    chown nomad:nomad /opt/nomad /etc/nomad.d
    chmod 755 /opt/consul /opt/nomad /etc/consul.d /etc/nomad.d
    chmod 750 /opt/nomad/data
}

# Configure Consul
configure_consul() {
    echo -e "${YELLOW}[6/10] Configuring Consul...${NC}"
    
    # Convert comma-separated IPs to JSON array
    IFS=',' read -ra IP_ARRAY <<< "$CONSUL_IPS"
    RETRY_JOIN_JSON=""
    for ip in "${IP_ARRAY[@]}"; do
        RETRY_JOIN_JSON+='"'$ip'",'
    done
    RETRY_JOIN_JSON=${RETRY_JOIN_JSON%,}
    
    cat > /etc/consul.d/consul.hcl << EOF
datacenter = "$DATACENTER"
data_dir = "/opt/consul"
log_level = "INFO"
server = true
bootstrap_expect = ${#IP_ARRAY[@]}
retry_join = [$RETRY_JOIN_JSON]
bind_addr = "$NODE_IP"
client_addr = "0.0.0.0"
ui_config {
  enabled = true
}
connect {
  enabled = true
}
EOF
    
    chown consul:consul /etc/consul.d/consul.hcl
    chmod 640 /etc/consul.d/consul.hcl
}

# Configure Nomad
configure_nomad() {
    echo -e "${YELLOW}[7/10] Configuring Nomad...${NC}"
    
    # Convert comma-separated IPs to JSON array
    IFS=',' read -ra IP_ARRAY <<< "$CONSUL_IPS"
    RETRY_JOIN_JSON=""
    for ip in "${IP_ARRAY[@]}"; do
        RETRY_JOIN_JSON+='"'$ip'",'
    done
    RETRY_JOIN_JSON=${RETRY_JOIN_JSON%,}
    
    cat > /etc/nomad.d/nomad.hcl << EOF
datacenter = "$DATACENTER"
data_dir = "/opt/nomad/data"
bind_addr = "0.0.0.0"

server {
  enabled = true
  bootstrap_expect = ${#IP_ARRAY[@]}
  server_join {
    retry_join = [$RETRY_JOIN_JSON]
  }
}

consul {
  address = "127.0.0.1:8500"
  server_service_name = "nomad"
  client_service_name = "nomad-client"
  auto_advertise = true
  server_auto_join = true
  client_auto_join = true
}

ui_config {
  enabled = true
}
EOF
    
    chown nomad:nomad /etc/nomad.d/nomad.hcl
    chmod 640 /etc/nomad.d/nomad.hcl
}

# Configure firewall
configure_firewall() {
    echo -e "${YELLOW}[8/10] Configuring firewall...${NC}"
    
    case "$FIREWALL_CMD" in
        ufw)
            if command -v ufw &> /dev/null; then
                ufw --force enable
                ufw allow 8500/tcp  # Consul HTTP
                ufw allow 8600/tcp  # Consul DNS
                ufw allow 8600/udp  # Consul DNS
                ufw allow 8300/tcp  # Consul server
                ufw allow 8301/tcp  # Consul serf LAN
                ufw allow 8301/udp  # Consul serf LAN
                ufw allow 8302/tcp  # Consul serf WAN
                ufw allow 8302/udp  # Consul serf WAN
                ufw allow 4646/tcp  # Nomad HTTP
                ufw allow 4647/tcp  # Nomad RPC
                ufw allow 4648/tcp  # Nomad serf
                ufw allow 4648/udp  # Nomad serf
            fi
            ;;
        firewall-cmd)
            if command -v firewall-cmd &> /dev/null && systemctl is-active --quiet firewalld; then
                firewall-cmd --permanent --add-port=8500/tcp
                firewall-cmd --permanent --add-port=8600/tcp
                firewall-cmd --permanent --add-port=8600/udp
                firewall-cmd --permanent --add-port=8300/tcp
                firewall-cmd --permanent --add-port=8301/tcp
                firewall-cmd --permanent --add-port=8301/udp
                firewall-cmd --permanent --add-port=8302/tcp
                firewall-cmd --permanent --add-port=8302/udp
                firewall-cmd --permanent --add-port=4646/tcp
                firewall-cmd --permanent --add-port=4647/tcp
                firewall-cmd --permanent --add-port=4648/tcp
                firewall-cmd --permanent --add-port=4648/udp
                firewall-cmd --reload
            fi
            ;;
    esac
}

# Start services
start_services() {
    echo -e "${YELLOW}[9/10] Starting services...${NC}"
    
    systemctl daemon-reload
    systemctl enable consul nomad
    systemctl start consul
    sleep 10
    systemctl start nomad
    sleep 5
}

# Verify installation
verify_installation() {
    echo -e "${YELLOW}[10/10] Verifying installation...${NC}"
    
    if systemctl is-active --quiet consul; then
        echo -e "${GREEN}✓ Consul is running${NC}"
    else
        echo -e "${RED}✗ Consul is not running${NC}"
        exit 1
    fi
    
    if systemctl is-active --quiet nomad; then
        echo -e "${GREEN}✓ Nomad is running${NC}"
    else
        echo -e "${RED}✗ Nomad is not running${NC}"
        exit 1
    fi
    
    echo -e "${GREEN}Installation completed successfully!${NC}"
    echo -e "${GREEN}Consul UI: http://$NODE_IP:8500${NC}"
    echo -e "${GREEN}Nomad UI: http://$NODE_IP:4646${NC}"
}

# Main execution
main() {
    check_prereqs
    detect_distro
    install_hashicorp_repo
    install_services
    create_directories
    configure_consul
    configure_nomad
    configure_firewall
    start_services
    verify_installation
}

main "$@"

Review the script before running. Execute with: bash install.sh

#nomad #job-templates #deployment-strategies #rolling-updates #canary-deployments

Advanced Nomad job templates and deployment strategies with rolling updates and canary deployments

Prerequisites

What this solves

Prerequisites and setup

Install Nomad cluster

Configure Consul server

Configure Nomad server

Start services

Advanced HCL job templates

Create parameterized job template

Deploy with variables

Rolling deployment strategies

Configure advanced update blocks

Monitor rolling deployment

Blue-green deployment pattern

Create blue-green job template

Load balancer configuration job

Execute blue-green deployment

Verify green deployment

Switch traffic by updating service tags

Stop blue environment after validation

Canary deployment implementation

Advanced canary configuration

Deploy and monitor canary

Monitor deployment progress

Manual promotion if needed

Manual rollback if issues detected

Advanced scheduling and constraints

Implement complex affinity rules

Monitoring and integration

Configure deployment monitoring

Deployment monitoring script

Set up automated testing

Automated deployment testing

Wait for service registration

Get service endpoints

Verify your setup

Verify job deployments

Check service discovery

Test service endpoints

Monitor resource usage

Common issues

Next steps

Related tutorials

Set up Kafka Streams testing framework with TopologyTestDriver for automated stream processing validation

Configure Consul multi-datacenter WAN federation for geographic redundancy

Configure centralized cron management with Ansible automation and systemd timers

Don't want to manage this yourself?