Install Ollama for Local AI Models on Linux

Set up Ollama to run large language models locally on your Linux server. This tutorial covers installation, GPU acceleration, model deployment, API configuration, and performance optimization.

Prerequisites

Root or sudo access
8GB RAM minimum
20GB free disk space
NVIDIA GPU (optional)

What this solves

Ollama allows you to run large language models like Llama 3, Code Llama, and Mistral directly on your Linux server without relying on external APIs. This gives you complete control over your AI infrastructure, ensures data privacy, and eliminates per-request costs for AI operations.

Step-by-step installation

Update system packages

Start by updating your package manager to ensure you have the latest security patches and dependencies.

sudo apt update && sudo apt upgrade -y

sudo dnf update -y

Install required dependencies

Install curl and other essential tools needed for the Ollama installation script.

sudo apt install -y curl wget gnupg lsb-release

sudo dnf install -y curl wget gnupg

Install NVIDIA GPU drivers (optional)

If you have an NVIDIA GPU, install the proprietary drivers for hardware acceleration. Skip this step for CPU-only installations.

sudo apt install -y nvidia-driver-535 nvidia-utils-535
sudo reboot

sudo dnf install -y akmod-nvidia xorg-x11-drv-nvidia-cuda
sudo reboot

Install Ollama

Download and run the official Ollama installation script, which will set up the binary and systemd service automatically.

curl -fsSL https://ollama.ai/install.sh | sh

Create Ollama system user

The installer creates an ollama user, but verify it exists and has the correct permissions for model storage.

sudo useradd -r -s /bin/false -U -m -d /usr/share/ollama ollama
sudo mkdir -p /usr/share/ollama/.ollama
sudo chown -R ollama:ollama /usr/share/ollama

Configure system resources

Increase system limits for the ollama user to handle large model files and concurrent connections.

ollama soft nofile 65536
ollama hard nofile 65536
ollama soft nproc 4096
ollama hard nproc 4096

Configure Ollama service

Create a systemd service configuration to control memory usage, API binding, and GPU access.

[Unit]
Description=Ollama Service
After=network-online.target

[Service]
ExecStart=/usr/local/bin/ollama serve
User=ollama
Group=ollama
Restart=always
RestartSec=3
Environment="OLLAMA_HOST=0.0.0.0:11434"
Environment="OLLAMA_MODELS=/usr/share/ollama/.ollama/models"
Environment="OLLAMA_NUM_PARALLEL=1"
Environment="OLLAMA_MAX_LOADED_MODELS=1"

[Install]
WantedBy=default.target

Enable and start Ollama service

Enable the service to start automatically on boot and start it immediately.

sudo systemctl daemon-reload
sudo systemctl enable --now ollama
sudo systemctl status ollama

Configure firewall access

Open port 11434 for API access. Restrict access to specific IP ranges in production environments.

sudo ufw allow from 192.168.1.0/24 to any port 11434
sudo ufw reload

sudo firewall-cmd --permanent --add-rich-rule="rule family=ipv4 source address=192.168.1.0/24 port protocol=tcp port=11434 accept"
sudo firewall-cmd --reload

Deploy and manage AI models

Download your first model

Pull a lightweight model like Llama 3.2 3B to test your installation. This will download approximately 2GB of data.

ollama pull llama3.2:3b

Test model inference

Run a simple query to verify the model is working correctly and responding to prompts.

ollama run llama3.2:3b "What is the capital of France?"

List available models

View all downloaded models and their sizes to manage disk space effectively.

ollama list

Download additional models

Install other useful models for different tasks. Code Llama is optimized for programming tasks.

ollama pull codellama:7b
ollama pull mistral:7b

Set up API access and authentication

Test REST API access

Verify the API is accessible from other machines on your network using curl.

curl -X POST http://203.0.113.10:11434/api/generate -d '{
  "model": "llama3.2:3b",
  "prompt": "Why is the sky blue?",
  "stream": false
}'

Configure API authentication with reverse proxy

Set up basic authentication using nginx to secure your Ollama API endpoint.

server {
    listen 443 ssl;
    server_name ollama.example.com;
    
    ssl_certificate /etc/ssl/certs/ollama.pem;
    ssl_certificate_key /etc/ssl/private/ollama.key;
    
    auth_basic "Ollama API";
    auth_basic_user_file /etc/nginx/.htpasswd;
    
    location / {
        proxy_pass http://127.0.0.1:11434;
        proxy_set_header Host $host;
        proxy_set_header X-Real-IP $remote_addr;
        proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
        proxy_set_header X-Forwarded-Proto $scheme;
    }
}

Create API authentication credentials

Generate a password file for basic authentication to restrict API access to authorized users only.

sudo apt install -y apache2-utils
sudo htpasswd -c /etc/nginx/.htpasswd ollama-user

sudo dnf install -y httpd-tools
sudo htpasswd -c /etc/nginx/.htpasswd ollama-user

Performance tuning and monitoring

Monitor system resources

Install htop and nvidia-smi to monitor CPU, memory, and GPU utilization during model inference.

sudo apt install -y htop iotop
htop

sudo dnf install -y htop iotop
htop

Optimize memory usage

Configure Ollama to automatically unload models when idle to free up system memory for other processes.

[Service]
Environment="OLLAMA_KEEP_ALIVE=5m"
Environment="OLLAMA_MAX_VRAM=8gb"
Environment="OLLAMA_NUM_PARALLEL=2"

Set up log rotation

Configure logrotate to manage Ollama service logs and prevent disk space issues.

/var/log/ollama/*.log {
    daily
    missingok
    rotate 7
    compress
    delaycompress
    notifempty
    copytruncate
}

Create monitoring script

Build a simple monitoring script to track model performance and system health.

#!/bin/bash
echo "=== Ollama Status ==="
sudo systemctl status ollama --no-pager
echo
echo "=== Model List ==="
ollama list
echo
echo "=== System Resources ==="
free -h
df -h /usr/share/ollama

Make monitoring script executable

Set proper permissions for the monitoring script and test its execution.

sudo chmod 755 /usr/local/bin/ollama-monitor.sh
/usr/local/bin/ollama-monitor.sh

Verify your setup

Run these commands to confirm Ollama is working correctly and ready for production use.

sudo systemctl status ollama
ollama --version
ollama list
curl -s http://localhost:11434/api/tags | grep -o '"name":"[^"]*' | cut -d'"' -f4
nvida-smi

Common issues

Symptom	Cause	Fix
Service won't start	Port 11434 already in use	`sudo lsof -i :11434` to find conflicting process
Model download fails	Insufficient disk space	`df -h` to check space, clean up with `ollama rm model-name`
GPU not detected	NVIDIA drivers not installed	Install drivers with `sudo apt install nvidia-driver-535`
API returns 404	Model not loaded	Run `ollama pull model-name` to download the model
High memory usage	Multiple models loaded	Set `OLLAMA_MAX_LOADED_MODELS=1` in service config
Slow inference	Running on CPU instead of GPU	Check `nvidia-smi` and verify CUDA installation

Next steps

Automated install script

Run this to automate the entire setup

install.sh

#!/usr/bin/env bash

set -euo pipefail

# Colors for output
RED='\033[0;31m'
GREEN='\033[0;32m'
YELLOW='\033[1;33m'
BLUE='\033[0;34m'
NC='\033[0m' # No Color

# Default values
OLLAMA_PORT="${OLLAMA_PORT:-11434}"
OLLAMA_HOST="${OLLAMA_HOST:-0.0.0.0}"
NETWORK_RANGE="${NETWORK_RANGE:-192.168.1.0/24}"
INSTALL_GPU_DRIVERS="${INSTALL_GPU_DRIVERS:-false}"
INITIAL_MODEL="${INITIAL_MODEL:-llama3.2:3b}"

# Usage function
usage() {
    echo "Usage: $0 [OPTIONS]"
    echo "Options:"
    echo "  --port PORT           Port for Ollama service (default: 11434)"
    echo "  --host HOST           Host binding (default: 0.0.0.0)"
    echo "  --network CIDR        Network range for firewall (default: 192.168.1.0/24)"
    echo "  --gpu                 Install NVIDIA GPU drivers"
    echo "  --model MODEL         Initial model to download (default: llama3.2:3b)"
    echo "  --help                Show this help message"
    exit 1
}

# Parse arguments
while [[ $# -gt 0 ]]; do
    case $1 in
        --port) OLLAMA_PORT="$2"; shift 2 ;;
        --host) OLLAMA_HOST="$2"; shift 2 ;;
        --network) NETWORK_RANGE="$2"; shift 2 ;;
        --gpu) INSTALL_GPU_DRIVERS="true"; shift ;;
        --model) INITIAL_MODEL="$2"; shift 2 ;;
        --help) usage ;;
        *) echo "Unknown option: $1"; usage ;;
    esac
done

# Logging functions
log_info() { echo -e "${BLUE}[INFO]${NC} $1"; }
log_success() { echo -e "${GREEN}[SUCCESS]${NC} $1"; }
log_warning() { echo -e "${YELLOW}[WARNING]${NC} $1"; }
log_error() { echo -e "${RED}[ERROR]${NC} $1"; }

# Cleanup function
cleanup() {
    local exit_code=$?
    if [ $exit_code -ne 0 ]; then
        log_error "Installation failed. Cleaning up..."
        systemctl stop ollama 2>/dev/null || true
        systemctl disable ollama 2>/dev/null || true
        rm -f /etc/systemd/system/ollama.service
        userdel -r ollama 2>/dev/null || true
        rm -f /usr/local/bin/ollama
    fi
    exit $exit_code
}

trap cleanup ERR

# Check if running as root or with sudo
if [[ $EUID -ne 0 && -z "${SUDO_USER:-}" ]]; then
    log_error "This script must be run as root or with sudo"
    exit 1
fi

# Detect distribution
if [ -f /etc/os-release ]; then
    . /etc/os-release
    case "$ID" in
        ubuntu|debian)
            PKG_MGR="apt"
            PKG_INSTALL="apt install -y"
            PKG_UPDATE="apt update && apt upgrade -y"
            FIREWALL_CMD="ufw"
            ;;
        almalinux|rocky|centos|rhel|ol|fedora)
            PKG_MGR="dnf"
            PKG_INSTALL="dnf install -y"
            PKG_UPDATE="dnf update -y"
            FIREWALL_CMD="firewall-cmd"
            ;;
        amzn)
            PKG_MGR="yum"
            PKG_INSTALL="yum install -y"
            PKG_UPDATE="yum update -y"
            FIREWALL_CMD="firewall-cmd"
            ;;
        *)
            log_error "Unsupported distribution: $ID"
            exit 1
            ;;
    esac
else
    log_error "Cannot detect distribution. /etc/os-release not found."
    exit 1
fi

log_info "Detected distribution: $ID"

echo -e "${GREEN}Starting Ollama installation on $ID${NC}"

# Step 1: Update system packages
log_info "[1/9] Updating system packages..."
$PKG_UPDATE
log_success "System packages updated"

# Step 2: Install required dependencies
log_info "[2/9] Installing required dependencies..."
$PKG_INSTALL curl wget gnupg
if [[ "$ID" =~ ^(ubuntu|debian)$ ]]; then
    $PKG_INSTALL lsb-release
fi
log_success "Dependencies installed"

# Step 3: Install NVIDIA GPU drivers (optional)
if [ "$INSTALL_GPU_DRIVERS" = "true" ]; then
    log_info "[3/9] Installing NVIDIA GPU drivers..."
    if [[ "$ID" =~ ^(ubuntu|debian)$ ]]; then
        $PKG_INSTALL nvidia-driver-535 nvidia-utils-535
    else
        $PKG_INSTALL akmod-nvidia xorg-x11-drv-nvidia-cuda
    fi
    log_warning "GPU drivers installed. You may need to reboot for changes to take effect."
else
    log_info "[3/9] Skipping GPU driver installation"
fi

# Step 4: Install Ollama
log_info "[4/9] Installing Ollama..."
curl -fsSL https://ollama.ai/install.sh | sh
log_success "Ollama binary installed"

# Step 5: Create Ollama system user and directories
log_info "[5/9] Configuring Ollama user and directories..."
if ! id ollama >/dev/null 2>&1; then
    useradd -r -s /bin/false -U -m -d /usr/share/ollama ollama
fi
mkdir -p /usr/share/ollama/.ollama/models
chown -R ollama:ollama /usr/share/ollama
chmod 755 /usr/share/ollama
chmod 755 /usr/share/ollama/.ollama
chmod 755 /usr/share/ollama/.ollama/models
log_success "Ollama user and directories configured"

# Step 6: Configure system limits
log_info "[6/9] Configuring system limits..."
cat > /etc/security/limits.d/ollama.conf << 'EOF'
ollama soft nofile 65536
ollama hard nofile 65536
ollama soft nproc 4096
ollama hard nproc 4096
EOF
log_success "System limits configured"

# Step 7: Configure Ollama systemd service
log_info "[7/9] Configuring Ollama systemd service..."
cat > /etc/systemd/system/ollama.service << EOF
[Unit]
Description=Ollama Service
After=network-online.target

[Service]
ExecStart=/usr/local/bin/ollama serve
User=ollama
Group=ollama
Restart=always
RestartSec=3
Environment="OLLAMA_HOST=${OLLAMA_HOST}:${OLLAMA_PORT}"
Environment="OLLAMA_MODELS=/usr/share/ollama/.ollama/models"
Environment="OLLAMA_NUM_PARALLEL=1"
Environment="OLLAMA_MAX_LOADED_MODELS=1"

[Install]
WantedBy=default.target
EOF

systemctl daemon-reload
systemctl enable ollama
systemctl start ollama
log_success "Ollama service configured and started"

# Step 8: Configure firewall
log_info "[8/9] Configuring firewall..."
if [[ "$FIREWALL_CMD" == "ufw" ]]; then
    if command -v ufw >/dev/null 2>&1; then
        ufw --force enable 2>/dev/null || true
        ufw allow from "$NETWORK_RANGE" to any port "$OLLAMA_PORT"
        ufw reload 2>/dev/null || true
    fi
elif [[ "$FIREWALL_CMD" == "firewall-cmd" ]]; then
    if command -v firewall-cmd >/dev/null 2>&1; then
        systemctl enable firewalld 2>/dev/null || true
        systemctl start firewalld 2>/dev/null || true
        firewall-cmd --permanent --add-rich-rule="rule family=ipv4 source address=$NETWORK_RANGE port protocol=tcp port=$OLLAMA_PORT accept" 2>/dev/null || true
        firewall-cmd --reload 2>/dev/null || true
    fi
fi
log_success "Firewall configured"

# Step 9: Download initial model and verify installation
log_info "[9/9] Downloading initial model and verifying installation..."
sleep 5  # Wait for service to be fully ready

# Set up environment for ollama commands
export OLLAMA_HOST="$OLLAMA_HOST:$OLLAMA_PORT"

# Download initial model as ollama user
sudo -u ollama -E /usr/local/bin/ollama pull "$INITIAL_MODEL"
log_success "Initial model '$INITIAL_MODEL' downloaded"

# Verify installation
log_info "Verifying installation..."

# Check service status
if systemctl is-active ollama >/dev/null 2>&1; then
    log_success "Ollama service is running"
else
    log_error "Ollama service is not running"
    exit 1
fi

# Check API endpoint
if curl -s "http://localhost:$OLLAMA_PORT/api/tags" >/dev/null; then
    log_success "Ollama API is responding"
else
    log_error "Ollama API is not responding"
    exit 1
fi

# Final success message
echo
log_success "Ollama installation completed successfully!"
echo
echo -e "${GREEN}Next steps:${NC}"
echo "1. Test the installation:"
echo "   ollama run $INITIAL_MODEL \"Hello, how are you?\""
echo
echo "2. Access the API from other machines:"
echo "   curl -X POST http://$(hostname -I | awk '{print $1}'):$OLLAMA_PORT/api/generate -d '{\"model\": \"$INITIAL_MODEL\", \"prompt\": \"Hello\", \"stream\": false}'"
echo
echo "3. Download additional models:"
echo "   ollama pull codellama:7b"
echo "   ollama pull mistral:7b"
echo
echo -e "${BLUE}Service management:${NC}"
echo "   systemctl status ollama"
echo "   systemctl restart ollama"
echo "   journalctl -u ollama -f"

Review the script before running. Execute with: bash install.sh

#ollama #ai #llm #gpu #machine-learning

Install and configure Ollama for local AI models on Linux servers