Set up Ollama to run large language models locally on your Linux server. This tutorial covers installation, GPU acceleration, model deployment, API configuration, and performance optimization.
Prerequisites
- Root or sudo access
- 8GB RAM minimum
- 20GB free disk space
- NVIDIA GPU (optional)
What this solves
Ollama allows you to run large language models like Llama 3, Code Llama, and Mistral directly on your Linux server without relying on external APIs. This gives you complete control over your AI infrastructure, ensures data privacy, and eliminates per-request costs for AI operations.
Step-by-step installation
Update system packages
Start by updating your package manager to ensure you have the latest security patches and dependencies.
sudo apt update && sudo apt upgrade -ysudo dnf update -yInstall required dependencies
Install curl and other essential tools needed for the Ollama installation script.
sudo apt install -y curl wget gnupg lsb-releasesudo dnf install -y curl wget gnupgInstall NVIDIA GPU drivers (optional)
If you have an NVIDIA GPU, install the proprietary drivers for hardware acceleration. Skip this step for CPU-only installations.
sudo apt install -y nvidia-driver-535 nvidia-utils-535
sudo rebootsudo dnf install -y akmod-nvidia xorg-x11-drv-nvidia-cuda
sudo rebootInstall Ollama
Download and run the official Ollama installation script, which will set up the binary and systemd service automatically.
curl -fsSL https://ollama.ai/install.sh | shCreate Ollama system user
The installer creates an ollama user, but verify it exists and has the correct permissions for model storage.
sudo useradd -r -s /bin/false -U -m -d /usr/share/ollama ollama
sudo mkdir -p /usr/share/ollama/.ollama
sudo chown -R ollama:ollama /usr/share/ollamaConfigure system resources
Increase system limits for the ollama user to handle large model files and concurrent connections.
ollama soft nofile 65536
ollama hard nofile 65536
ollama soft nproc 4096
ollama hard nproc 4096Configure Ollama service
Create a systemd service configuration to control memory usage, API binding, and GPU access.
[Unit]
Description=Ollama Service
After=network-online.target
[Service]
ExecStart=/usr/local/bin/ollama serve
User=ollama
Group=ollama
Restart=always
RestartSec=3
Environment="OLLAMA_HOST=0.0.0.0:11434"
Environment="OLLAMA_MODELS=/usr/share/ollama/.ollama/models"
Environment="OLLAMA_NUM_PARALLEL=1"
Environment="OLLAMA_MAX_LOADED_MODELS=1"
[Install]
WantedBy=default.targetEnable and start Ollama service
Enable the service to start automatically on boot and start it immediately.
sudo systemctl daemon-reload
sudo systemctl enable --now ollama
sudo systemctl status ollamaConfigure firewall access
Open port 11434 for API access. Restrict access to specific IP ranges in production environments.
sudo ufw allow from 192.168.1.0/24 to any port 11434
sudo ufw reloadsudo firewall-cmd --permanent --add-rich-rule="rule family=ipv4 source address=192.168.1.0/24 port protocol=tcp port=11434 accept"
sudo firewall-cmd --reloadDeploy and manage AI models
Download your first model
Pull a lightweight model like Llama 3.2 3B to test your installation. This will download approximately 2GB of data.
ollama pull llama3.2:3bTest model inference
Run a simple query to verify the model is working correctly and responding to prompts.
ollama run llama3.2:3b "What is the capital of France?"List available models
View all downloaded models and their sizes to manage disk space effectively.
ollama listDownload additional models
Install other useful models for different tasks. Code Llama is optimized for programming tasks.
ollama pull codellama:7b
ollama pull mistral:7bSet up API access and authentication
Test REST API access
Verify the API is accessible from other machines on your network using curl.
curl -X POST http://203.0.113.10:11434/api/generate -d '{
"model": "llama3.2:3b",
"prompt": "Why is the sky blue?",
"stream": false
}'Configure API authentication with reverse proxy
Set up basic authentication using nginx to secure your Ollama API endpoint.
server {
listen 443 ssl;
server_name ollama.example.com;
ssl_certificate /etc/ssl/certs/ollama.pem;
ssl_certificate_key /etc/ssl/private/ollama.key;
auth_basic "Ollama API";
auth_basic_user_file /etc/nginx/.htpasswd;
location / {
proxy_pass http://127.0.0.1:11434;
proxy_set_header Host $host;
proxy_set_header X-Real-IP $remote_addr;
proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
proxy_set_header X-Forwarded-Proto $scheme;
}
}Create API authentication credentials
Generate a password file for basic authentication to restrict API access to authorized users only.
sudo apt install -y apache2-utils
sudo htpasswd -c /etc/nginx/.htpasswd ollama-usersudo dnf install -y httpd-tools
sudo htpasswd -c /etc/nginx/.htpasswd ollama-userPerformance tuning and monitoring
Monitor system resources
Install htop and nvidia-smi to monitor CPU, memory, and GPU utilization during model inference.
sudo apt install -y htop iotop
htopsudo dnf install -y htop iotop
htopOptimize memory usage
Configure Ollama to automatically unload models when idle to free up system memory for other processes.
[Service]
Environment="OLLAMA_KEEP_ALIVE=5m"
Environment="OLLAMA_MAX_VRAM=8gb"
Environment="OLLAMA_NUM_PARALLEL=2"Set up log rotation
Configure logrotate to manage Ollama service logs and prevent disk space issues.
/var/log/ollama/*.log {
daily
missingok
rotate 7
compress
delaycompress
notifempty
copytruncate
}Create monitoring script
Build a simple monitoring script to track model performance and system health.
#!/bin/bash
echo "=== Ollama Status ==="
sudo systemctl status ollama --no-pager
echo
echo "=== Model List ==="
ollama list
echo
echo "=== System Resources ==="
free -h
df -h /usr/share/ollamaMake monitoring script executable
Set proper permissions for the monitoring script and test its execution.
sudo chmod 755 /usr/local/bin/ollama-monitor.sh
/usr/local/bin/ollama-monitor.shVerify your setup
Run these commands to confirm Ollama is working correctly and ready for production use.
sudo systemctl status ollama
ollama --version
ollama list
curl -s http://localhost:11434/api/tags | grep -o '"name":"[^"]*' | cut -d'"' -f4
nvida-smiCommon issues
| Symptom | Cause | Fix |
|---|---|---|
| Service won't start | Port 11434 already in use | sudo lsof -i :11434 to find conflicting process |
| Model download fails | Insufficient disk space | df -h to check space, clean up with ollama rm model-name |
| GPU not detected | NVIDIA drivers not installed | Install drivers with sudo apt install nvidia-driver-535 |
| API returns 404 | Model not loaded | Run ollama pull model-name to download the model |
| High memory usage | Multiple models loaded | Set OLLAMA_MAX_LOADED_MODELS=1 in service config |
| Slow inference | Running on CPU instead of GPU | Check nvidia-smi and verify CUDA installation |
Next steps
Automated install script
Run this to automate the entire setup
#!/usr/bin/env bash
set -euo pipefail
# Colors for output
RED='\033[0;31m'
GREEN='\033[0;32m'
YELLOW='\033[1;33m'
BLUE='\033[0;34m'
NC='\033[0m' # No Color
# Default values
OLLAMA_PORT="${OLLAMA_PORT:-11434}"
OLLAMA_HOST="${OLLAMA_HOST:-0.0.0.0}"
NETWORK_RANGE="${NETWORK_RANGE:-192.168.1.0/24}"
INSTALL_GPU_DRIVERS="${INSTALL_GPU_DRIVERS:-false}"
INITIAL_MODEL="${INITIAL_MODEL:-llama3.2:3b}"
# Usage function
usage() {
echo "Usage: $0 [OPTIONS]"
echo "Options:"
echo " --port PORT Port for Ollama service (default: 11434)"
echo " --host HOST Host binding (default: 0.0.0.0)"
echo " --network CIDR Network range for firewall (default: 192.168.1.0/24)"
echo " --gpu Install NVIDIA GPU drivers"
echo " --model MODEL Initial model to download (default: llama3.2:3b)"
echo " --help Show this help message"
exit 1
}
# Parse arguments
while [[ $# -gt 0 ]]; do
case $1 in
--port) OLLAMA_PORT="$2"; shift 2 ;;
--host) OLLAMA_HOST="$2"; shift 2 ;;
--network) NETWORK_RANGE="$2"; shift 2 ;;
--gpu) INSTALL_GPU_DRIVERS="true"; shift ;;
--model) INITIAL_MODEL="$2"; shift 2 ;;
--help) usage ;;
*) echo "Unknown option: $1"; usage ;;
esac
done
# Logging functions
log_info() { echo -e "${BLUE}[INFO]${NC} $1"; }
log_success() { echo -e "${GREEN}[SUCCESS]${NC} $1"; }
log_warning() { echo -e "${YELLOW}[WARNING]${NC} $1"; }
log_error() { echo -e "${RED}[ERROR]${NC} $1"; }
# Cleanup function
cleanup() {
local exit_code=$?
if [ $exit_code -ne 0 ]; then
log_error "Installation failed. Cleaning up..."
systemctl stop ollama 2>/dev/null || true
systemctl disable ollama 2>/dev/null || true
rm -f /etc/systemd/system/ollama.service
userdel -r ollama 2>/dev/null || true
rm -f /usr/local/bin/ollama
fi
exit $exit_code
}
trap cleanup ERR
# Check if running as root or with sudo
if [[ $EUID -ne 0 && -z "${SUDO_USER:-}" ]]; then
log_error "This script must be run as root or with sudo"
exit 1
fi
# Detect distribution
if [ -f /etc/os-release ]; then
. /etc/os-release
case "$ID" in
ubuntu|debian)
PKG_MGR="apt"
PKG_INSTALL="apt install -y"
PKG_UPDATE="apt update && apt upgrade -y"
FIREWALL_CMD="ufw"
;;
almalinux|rocky|centos|rhel|ol|fedora)
PKG_MGR="dnf"
PKG_INSTALL="dnf install -y"
PKG_UPDATE="dnf update -y"
FIREWALL_CMD="firewall-cmd"
;;
amzn)
PKG_MGR="yum"
PKG_INSTALL="yum install -y"
PKG_UPDATE="yum update -y"
FIREWALL_CMD="firewall-cmd"
;;
*)
log_error "Unsupported distribution: $ID"
exit 1
;;
esac
else
log_error "Cannot detect distribution. /etc/os-release not found."
exit 1
fi
log_info "Detected distribution: $ID"
echo -e "${GREEN}Starting Ollama installation on $ID${NC}"
# Step 1: Update system packages
log_info "[1/9] Updating system packages..."
$PKG_UPDATE
log_success "System packages updated"
# Step 2: Install required dependencies
log_info "[2/9] Installing required dependencies..."
$PKG_INSTALL curl wget gnupg
if [[ "$ID" =~ ^(ubuntu|debian)$ ]]; then
$PKG_INSTALL lsb-release
fi
log_success "Dependencies installed"
# Step 3: Install NVIDIA GPU drivers (optional)
if [ "$INSTALL_GPU_DRIVERS" = "true" ]; then
log_info "[3/9] Installing NVIDIA GPU drivers..."
if [[ "$ID" =~ ^(ubuntu|debian)$ ]]; then
$PKG_INSTALL nvidia-driver-535 nvidia-utils-535
else
$PKG_INSTALL akmod-nvidia xorg-x11-drv-nvidia-cuda
fi
log_warning "GPU drivers installed. You may need to reboot for changes to take effect."
else
log_info "[3/9] Skipping GPU driver installation"
fi
# Step 4: Install Ollama
log_info "[4/9] Installing Ollama..."
curl -fsSL https://ollama.ai/install.sh | sh
log_success "Ollama binary installed"
# Step 5: Create Ollama system user and directories
log_info "[5/9] Configuring Ollama user and directories..."
if ! id ollama >/dev/null 2>&1; then
useradd -r -s /bin/false -U -m -d /usr/share/ollama ollama
fi
mkdir -p /usr/share/ollama/.ollama/models
chown -R ollama:ollama /usr/share/ollama
chmod 755 /usr/share/ollama
chmod 755 /usr/share/ollama/.ollama
chmod 755 /usr/share/ollama/.ollama/models
log_success "Ollama user and directories configured"
# Step 6: Configure system limits
log_info "[6/9] Configuring system limits..."
cat > /etc/security/limits.d/ollama.conf << 'EOF'
ollama soft nofile 65536
ollama hard nofile 65536
ollama soft nproc 4096
ollama hard nproc 4096
EOF
log_success "System limits configured"
# Step 7: Configure Ollama systemd service
log_info "[7/9] Configuring Ollama systemd service..."
cat > /etc/systemd/system/ollama.service << EOF
[Unit]
Description=Ollama Service
After=network-online.target
[Service]
ExecStart=/usr/local/bin/ollama serve
User=ollama
Group=ollama
Restart=always
RestartSec=3
Environment="OLLAMA_HOST=${OLLAMA_HOST}:${OLLAMA_PORT}"
Environment="OLLAMA_MODELS=/usr/share/ollama/.ollama/models"
Environment="OLLAMA_NUM_PARALLEL=1"
Environment="OLLAMA_MAX_LOADED_MODELS=1"
[Install]
WantedBy=default.target
EOF
systemctl daemon-reload
systemctl enable ollama
systemctl start ollama
log_success "Ollama service configured and started"
# Step 8: Configure firewall
log_info "[8/9] Configuring firewall..."
if [[ "$FIREWALL_CMD" == "ufw" ]]; then
if command -v ufw >/dev/null 2>&1; then
ufw --force enable 2>/dev/null || true
ufw allow from "$NETWORK_RANGE" to any port "$OLLAMA_PORT"
ufw reload 2>/dev/null || true
fi
elif [[ "$FIREWALL_CMD" == "firewall-cmd" ]]; then
if command -v firewall-cmd >/dev/null 2>&1; then
systemctl enable firewalld 2>/dev/null || true
systemctl start firewalld 2>/dev/null || true
firewall-cmd --permanent --add-rich-rule="rule family=ipv4 source address=$NETWORK_RANGE port protocol=tcp port=$OLLAMA_PORT accept" 2>/dev/null || true
firewall-cmd --reload 2>/dev/null || true
fi
fi
log_success "Firewall configured"
# Step 9: Download initial model and verify installation
log_info "[9/9] Downloading initial model and verifying installation..."
sleep 5 # Wait for service to be fully ready
# Set up environment for ollama commands
export OLLAMA_HOST="$OLLAMA_HOST:$OLLAMA_PORT"
# Download initial model as ollama user
sudo -u ollama -E /usr/local/bin/ollama pull "$INITIAL_MODEL"
log_success "Initial model '$INITIAL_MODEL' downloaded"
# Verify installation
log_info "Verifying installation..."
# Check service status
if systemctl is-active ollama >/dev/null 2>&1; then
log_success "Ollama service is running"
else
log_error "Ollama service is not running"
exit 1
fi
# Check API endpoint
if curl -s "http://localhost:$OLLAMA_PORT/api/tags" >/dev/null; then
log_success "Ollama API is responding"
else
log_error "Ollama API is not responding"
exit 1
fi
# Final success message
echo
log_success "Ollama installation completed successfully!"
echo
echo -e "${GREEN}Next steps:${NC}"
echo "1. Test the installation:"
echo " ollama run $INITIAL_MODEL \"Hello, how are you?\""
echo
echo "2. Access the API from other machines:"
echo " curl -X POST http://$(hostname -I | awk '{print $1}'):$OLLAMA_PORT/api/generate -d '{\"model\": \"$INITIAL_MODEL\", \"prompt\": \"Hello\", \"stream\": false}'"
echo
echo "3. Download additional models:"
echo " ollama pull codellama:7b"
echo " ollama pull mistral:7b"
echo
echo -e "${BLUE}Service management:${NC}"
echo " systemctl status ollama"
echo " systemctl restart ollama"
echo " journalctl -u ollama -f"
Review the script before running. Execute with: bash install.sh