Install and configure Ollama for local AI models on Linux servers

Intermediate 25 min Mar 31, 2026 22 views
Ubuntu 24.04 Ubuntu 22.04 Debian 12 AlmaLinux 9 Rocky Linux 9 Fedora 41

Set up Ollama to run large language models locally on your Linux server. This tutorial covers installation, GPU acceleration, model deployment, API configuration, and performance optimization.

Prerequisites

  • Root or sudo access
  • 8GB RAM minimum
  • 20GB free disk space
  • NVIDIA GPU (optional)

What this solves

Ollama allows you to run large language models like Llama 3, Code Llama, and Mistral directly on your Linux server without relying on external APIs. This gives you complete control over your AI infrastructure, ensures data privacy, and eliminates per-request costs for AI operations.

Step-by-step installation

Update system packages

Start by updating your package manager to ensure you have the latest security patches and dependencies.

sudo apt update && sudo apt upgrade -y

Install required dependencies

Install curl and other essential tools needed for the Ollama installation script.

sudo apt install -y curl wget gnupg lsb-release

Install NVIDIA GPU drivers (optional)

If you have an NVIDIA GPU, install the proprietary drivers for hardware acceleration. Skip this step for CPU-only installations.

sudo apt install -y nvidia-driver-535 nvidia-utils-535
sudo reboot

Install Ollama

Download and run the official Ollama installation script, which will set up the binary and systemd service automatically.

curl -fsSL https://ollama.ai/install.sh | sh

Create Ollama system user

The installer creates an ollama user, but verify it exists and has the correct permissions for model storage.

sudo useradd -r -s /bin/false -U -m -d /usr/share/ollama ollama
sudo mkdir -p /usr/share/ollama/.ollama
sudo chown -R ollama:ollama /usr/share/ollama

Configure system resources

Increase system limits for the ollama user to handle large model files and concurrent connections.

ollama soft nofile 65536
ollama hard nofile 65536
ollama soft nproc 4096
ollama hard nproc 4096

Configure Ollama service

Create a systemd service configuration to control memory usage, API binding, and GPU access.

[Unit]
Description=Ollama Service
After=network-online.target

[Service]
ExecStart=/usr/local/bin/ollama serve
User=ollama
Group=ollama
Restart=always
RestartSec=3
Environment="OLLAMA_HOST=0.0.0.0:11434"
Environment="OLLAMA_MODELS=/usr/share/ollama/.ollama/models"
Environment="OLLAMA_NUM_PARALLEL=1"
Environment="OLLAMA_MAX_LOADED_MODELS=1"

[Install]
WantedBy=default.target

Enable and start Ollama service

Enable the service to start automatically on boot and start it immediately.

sudo systemctl daemon-reload
sudo systemctl enable --now ollama
sudo systemctl status ollama

Configure firewall access

Open port 11434 for API access. Restrict access to specific IP ranges in production environments.

sudo ufw allow from 192.168.1.0/24 to any port 11434
sudo ufw reload

Deploy and manage AI models

Download your first model

Pull a lightweight model like Llama 3.2 3B to test your installation. This will download approximately 2GB of data.

ollama pull llama3.2:3b

Test model inference

Run a simple query to verify the model is working correctly and responding to prompts.

ollama run llama3.2:3b "What is the capital of France?"

List available models

View all downloaded models and their sizes to manage disk space effectively.

ollama list

Download additional models

Install other useful models for different tasks. Code Llama is optimized for programming tasks.

ollama pull codellama:7b
ollama pull mistral:7b

Set up API access and authentication

Test REST API access

Verify the API is accessible from other machines on your network using curl.

curl -X POST http://203.0.113.10:11434/api/generate -d '{
  "model": "llama3.2:3b",
  "prompt": "Why is the sky blue?",
  "stream": false
}'

Configure API authentication with reverse proxy

Set up basic authentication using nginx to secure your Ollama API endpoint.

server {
    listen 443 ssl;
    server_name ollama.example.com;
    
    ssl_certificate /etc/ssl/certs/ollama.pem;
    ssl_certificate_key /etc/ssl/private/ollama.key;
    
    auth_basic "Ollama API";
    auth_basic_user_file /etc/nginx/.htpasswd;
    
    location / {
        proxy_pass http://127.0.0.1:11434;
        proxy_set_header Host $host;
        proxy_set_header X-Real-IP $remote_addr;
        proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
        proxy_set_header X-Forwarded-Proto $scheme;
    }
}

Create API authentication credentials

Generate a password file for basic authentication to restrict API access to authorized users only.

sudo apt install -y apache2-utils
sudo htpasswd -c /etc/nginx/.htpasswd ollama-user

Performance tuning and monitoring

Monitor system resources

Install htop and nvidia-smi to monitor CPU, memory, and GPU utilization during model inference.

sudo apt install -y htop iotop
htop

Optimize memory usage

Configure Ollama to automatically unload models when idle to free up system memory for other processes.

[Service]
Environment="OLLAMA_KEEP_ALIVE=5m"
Environment="OLLAMA_MAX_VRAM=8gb"
Environment="OLLAMA_NUM_PARALLEL=2"

Set up log rotation

Configure logrotate to manage Ollama service logs and prevent disk space issues.

/var/log/ollama/*.log {
    daily
    missingok
    rotate 7
    compress
    delaycompress
    notifempty
    copytruncate
}

Create monitoring script

Build a simple monitoring script to track model performance and system health.

#!/bin/bash
echo "=== Ollama Status ==="
sudo systemctl status ollama --no-pager
echo
echo "=== Model List ==="
ollama list
echo
echo "=== System Resources ==="
free -h
df -h /usr/share/ollama

Make monitoring script executable

Set proper permissions for the monitoring script and test its execution.

sudo chmod 755 /usr/local/bin/ollama-monitor.sh
/usr/local/bin/ollama-monitor.sh

Verify your setup

Run these commands to confirm Ollama is working correctly and ready for production use.

sudo systemctl status ollama
ollama --version
ollama list
curl -s http://localhost:11434/api/tags | grep -o '"name":"[^"]*' | cut -d'"' -f4
nvida-smi

Common issues

SymptomCauseFix
Service won't startPort 11434 already in usesudo lsof -i :11434 to find conflicting process
Model download failsInsufficient disk spacedf -h to check space, clean up with ollama rm model-name
GPU not detectedNVIDIA drivers not installedInstall drivers with sudo apt install nvidia-driver-535
API returns 404Model not loadedRun ollama pull model-name to download the model
High memory usageMultiple models loadedSet OLLAMA_MAX_LOADED_MODELS=1 in service config
Slow inferenceRunning on CPU instead of GPUCheck nvidia-smi and verify CUDA installation

Next steps

Automated install script

Run this to automate the entire setup

#ollama #ai #llm #gpu #machine-learning

Need help?

Don't want to manage this yourself?

We handle infrastructure for businesses that depend on uptime. From initial setup to ongoing operations.

Talk to an engineer