Set up automated data retention and compression policies in TimescaleDB to optimize storage usage and maintain database performance. Learn to configure drop_chunks and compression policies with monitoring.
Prerequisites
- PostgreSQL 16+ with TimescaleDB extension
- Database with appropriate user permissions
- Basic understanding of SQL and time-series data concepts
What this solves
TimescaleDB generates massive amounts of time-series data that grows continuously. Without proper retention policies, your database storage fills up, performance degrades, and costs spiral out of control. Automated retention policies with drop_chunks remove old data and compression policies reduce storage footprint while maintaining query performance.
Step-by-step configuration
Install TimescaleDB extension
First, verify TimescaleDB is installed and create the extension in your PostgreSQL database.
sudo apt update
sudo apt install -y timescaledb-2-postgresql-16
Connect to your PostgreSQL database and create the TimescaleDB extension.
sudo -u postgres psql -d your_database
CREATE EXTENSION IF NOT EXISTS timescaledb;
Create a hypertable for testing
Set up a sample hypertable to demonstrate retention policies. This creates a partitioned table optimized for time-series data.
sudo -u postgres psql -d your_database -c "
CREATE TABLE sensor_data (
time TIMESTAMPTZ NOT NULL,
sensor_id INTEGER,
temperature DOUBLE PRECISION,
humidity DOUBLE PRECISION
);
SELECT create_hypertable('sensor_data', 'time', chunk_time_interval => INTERVAL '1 day');
"
Insert sample data across multiple time periods
Generate test data spanning several months to demonstrate retention policy effectiveness.
sudo -u postgres psql -d your_database -c "
INSERT INTO sensor_data
SELECT
generate_series(
NOW() - INTERVAL '6 months',
NOW(),
INTERVAL '1 hour'
) as time,
(random() * 100)::INTEGER as sensor_id,
(random() * 40 + 10) as temperature,
(random() * 100) as humidity;
"
Configure automated data retention policy
Create a retention policy that automatically drops chunks older than 90 days. This runs daily and removes old data permanently.
sudo -u postgres psql -d your_database -c "
SELECT add_retention_policy('sensor_data', INTERVAL '90 days');
"
For more granular control, specify the job schedule and timezone.
sudo -u postgres psql -d your_database -c "
SELECT add_retention_policy(
'sensor_data',
INTERVAL '90 days',
schedule_interval => INTERVAL '1 day',
timezone => 'UTC'
);
"
Set up compression policies for storage optimization
Enable compression on chunks older than 7 days to reduce storage usage while maintaining read performance.
sudo -u postgres psql -d your_database -c "
ALTER TABLE sensor_data SET (
timescaledb.compress,
timescaledb.compress_segmentby = 'sensor_id',
timescaledb.compress_orderby = 'time'
);
"
Add the compression policy to automatically compress old chunks.
sudo -u postgres psql -d your_database -c "
SELECT add_compression_policy('sensor_data', INTERVAL '7 days');
"
Configure different retention policies for multiple tables
Set up varying retention periods based on data importance and compliance requirements.
sudo -u postgres psql -d your_database -c "
-- Critical metrics: keep for 2 years
SELECT add_retention_policy('critical_metrics', INTERVAL '2 years');
-- Application logs: keep for 30 days
SELECT add_retention_policy('app_logs', INTERVAL '30 days');
-- Debug data: keep for 7 days
SELECT add_retention_policy('debug_traces', INTERVAL '7 days');
"
Create custom retention conditions
Implement conditional retention based on data values, not just time. This keeps important data longer while removing low-value records.
sudo -u postgres psql -d your_database -c "
-- Custom function for conditional retention
CREATE OR REPLACE FUNCTION custom_retention_filter()
RETURNS VOID AS \$\$
BEGIN
-- Delete error-free logs older than 30 days
DELETE FROM app_logs
WHERE time < NOW() - INTERVAL '30 days'
AND log_level != 'ERROR';
-- Keep error logs for 90 days
DELETE FROM app_logs
WHERE time < NOW() - INTERVAL '90 days'
AND log_level = 'ERROR';
END;
\$\$ LANGUAGE plpgsql;
"
Schedule custom retention with pg_cron
Install pg_cron extension for advanced scheduling capabilities beyond TimescaleDB's basic policies.
sudo apt install -y postgresql-16-cron
Add pg_cron to your PostgreSQL configuration.
shared_preload_libraries = 'timescaledb,pg_cron'
cron.database_name = 'your_database'
Restart PostgreSQL and create the extension.
sudo systemctl restart postgresql
sudo -u postgres psql -d your_database -c "CREATE EXTENSION pg_cron;"
Schedule your custom retention function to run daily at 2 AM.
sudo -u postgres psql -d your_database -c "
SELECT cron.schedule('custom-retention', '0 2 *', 'SELECT custom_retention_filter();');
"
Monitor retention policy performance
Create views to track retention policy execution and storage savings.
sudo -u postgres psql -d your_database -c "
-- View retention policy jobs
CREATE VIEW retention_jobs AS
SELECT
job_id,
application_name,
schedule_interval,
max_runtime,
max_retries,
retry_period,
proc_schema,
proc_name,
owner,
scheduled,
fixed_schedule,
initial_start,
hypertable_schema,
hypertable_name,
config
FROM timescaledb_information.jobs
WHERE proc_name = 'policy_retention';
-- View compression statistics
CREATE VIEW compression_stats AS
SELECT
schemaname,
tablename,
attname,
n_distinct,
correlation,
most_common_vals,
most_common_freqs
FROM pg_stats
WHERE schemaname NOT IN ('information_schema', 'pg_catalog', 'timescaledb_information');
"
Set up retention policy alerting
Create monitoring functions to alert when retention policies fail or storage usage exceeds thresholds.
sudo -u postgres psql -d your_database -c "
-- Function to check failed retention jobs
CREATE OR REPLACE FUNCTION check_retention_health()
RETURNS TABLE(job_id INT, hypertable_name TEXT, last_run_status TEXT) AS \$\$
BEGIN
RETURN QUERY
SELECT
j.job_id,
j.hypertable_name::TEXT,
jh.last_run_status::TEXT
FROM timescaledb_information.jobs j
JOIN timescaledb_information.job_stats jh ON j.job_id = jh.job_id
WHERE j.proc_name = 'policy_retention'
AND jh.last_run_status != 'Success'
AND jh.last_run_started_at > NOW() - INTERVAL '25 hours';
END;
\$\$ LANGUAGE plpgsql;
-- Function to check storage growth
CREATE OR REPLACE FUNCTION check_storage_usage()
RETURNS TABLE(table_name TEXT, size_mb BIGINT, growth_rate NUMERIC) AS \$\$
BEGIN
RETURN QUERY
SELECT
schemaname || '.' || tablename as table_name,
pg_total_relation_size(schemaname||'.'||tablename) / (1024*1024) as size_mb,
0::NUMERIC as growth_rate
FROM pg_tables
WHERE schemaname NOT IN ('information_schema', 'pg_catalog', 'timescaledb_information')
AND pg_total_relation_size(schemaname||'.'||tablename) > 100 1024 1024; -- > 100MB
END;
\$\$ LANGUAGE plpgsql;
"
Understanding TimescaleDB retention policies
Retention policy types
| Policy Type | Purpose | Storage Impact | Performance Impact |
|---|---|---|---|
| Drop chunks | Permanently removes old data | Major reduction | Improves query speed |
| Compression | Compresses historical data | 70-95% reduction | Slower writes, faster reads |
| Tiered storage | Moves data to cheaper storage | Cost reduction | Increased latency |
Compression algorithm selection
TimescaleDB uses different compression algorithms based on data types and patterns.
| Data Type | Algorithm | Compression Ratio | Best For |
|---|---|---|---|
| TIMESTAMPTZ | Delta encoding | 80-90% | Regular time intervals |
| INTEGER | Delta + bit packing | 70-95% | Sensor IDs, counters |
| DOUBLE PRECISION | Gorilla compression | 60-80% | Temperature, metrics |
| TEXT | Dictionary encoding | 50-90% | Status codes, labels |
Verify your setup
Check that your retention and compression policies are working correctly.
# View all configured policies
sudo -u postgres psql -d your_database -c "
SELECT
job_id,
application_name,
schedule_interval,
proc_name,
hypertable_name,
config
FROM timescaledb_information.jobs
WHERE proc_name IN ('policy_retention', 'policy_compression');"
Check policy execution history
sudo -u postgres psql -d your_database -c "
SELECT
job_id,
last_run_started_at,
last_run_status,
last_run_duration,
next_start
FROM timescaledb_information.job_stats
ORDER BY last_run_started_at DESC;"
View chunk information and compression status
sudo -u postgres psql -d your_database -c "
SELECT
chunk_name,
range_start,
range_end,
is_compressed,
compressed_chunk_id
FROM timescaledb_information.chunks
WHERE hypertable_name = 'sensor_data'
ORDER BY range_start DESC;"
Check storage usage before and after policies
sudo -u postgres psql -d your_database -c "
SELECT
schemaname,
tablename,
pg_size_pretty(pg_total_relation_size(schemaname||'.'||tablename)) as size
FROM pg_tables
WHERE tablename LIKE '%sensor_data%';"
Test retention policy health
sudo -u postgres psql -d your_database -c "SELECT * FROM check_retention_health();"
Monitor storage usage
sudo -u postgres psql -d your_database -c "SELECT * FROM check_storage_usage();"
Common issues
| Symptom | Cause | Fix |
|---|---|---|
| Retention policy not running | Background worker not enabled | Set timescaledb.max_background_workers = 8 in postgresql.conf |
| Compression failing | Active writes to old chunks | Increase compression interval or stop writes to historical data |
| High storage usage after retention | VACUUM not running | Run VACUUM FULL or enable autovacuum |
| Slow queries after compression | Poor segment/order by selection | Adjust compress_segmentby and compress_orderby based on query patterns |
| Policy job stuck | Long-running transaction blocking | Check pg_stat_activity and terminate blocking queries |
| Retention deleting wrong data | Incorrect time zone handling | Verify timezone settings and use UTC consistently |
Monitoring and managing retention policies
View retention policy execution logs
sudo -u postgres psql -d your_database -c "
SELECT
job_id,
pid,
start_time,
finish_time,
job_status,
proc_name,
data->'message' as message
FROM timescaledb_information.job_errors
WHERE finish_time > NOW() - INTERVAL '7 days'
ORDER BY start_time DESC;"
Manual policy execution for testing
Run retention policies manually to test configuration without waiting for the schedule.
# Get job ID for your retention policy
sudo -u postgres psql -d your_database -c "
SELECT job_id, hypertable_name
FROM timescaledb_information.jobs
WHERE proc_name = 'policy_retention';"
Execute specific job manually (replace 1000 with actual job_id)
sudo -u postgres psql -d your_database -c "CALL run_job(1000);"
Check execution results
sudo -u postgres psql -d your_database -c "
SELECT * FROM timescaledb_information.job_stats WHERE job_id = 1000;"
Modify existing retention policies
# Change retention period
sudo -u postgres psql -d your_database -c "
SELECT alter_job(1000, config => '{\"drop_after\": \"60 days\"}');"
Change schedule frequency
sudo -u postgres psql -d your_database -c "
SELECT alter_job(1000, schedule_interval => INTERVAL '12 hours');"
Disable a retention policy temporarily
sudo -u postgres psql -d your_database -c "
SELECT alter_job(1000, scheduled => false);"
Re-enable the policy
sudo -u postgres psql -d your_database -c "
SELECT alter_job(1000, scheduled => true);"
Remove retention policies
# Remove retention policy (replace 'sensor_data' with your table)
sudo -u postgres psql -d your_database -c "
SELECT remove_retention_policy('sensor_data');"
Remove compression policy
sudo -u postgres psql -d your_database -c "
SELECT remove_compression_policy('sensor_data');"
Next steps
- Monitor PostgreSQL performance with Prometheus and Grafana for comprehensive TimescaleDB observability
- Configure PostgreSQL streaming replication to add high availability to your TimescaleDB setup
- Implement TimescaleDB backup strategies for disaster recovery and point-in-time recovery
- Set up TimescaleDB clustering for high availability with automatic failover
- Configure TimescaleDB continuous aggregates for real-time analytics dashboards
Running this in production?
Automated install script
Run this to automate the entire setup
#!/usr/bin/env bash
set -euo pipefail
# Colors for output
RED='\033[0;31m'
GREEN='\033[0;32m'
YELLOW='\033[1;33m'
NC='\033[0m' # No Color
# Default values
DATABASE_NAME="${1:-timescaledb_production}"
RETENTION_DAYS="${2:-90}"
COMPRESSION_DAYS="${3:-7}"
# Usage message
show_usage() {
echo "Usage: $0 [database_name] [retention_days] [compression_days]"
echo " database_name: PostgreSQL database name (default: timescaledb_production)"
echo " retention_days: Days to keep data (default: 90)"
echo " compression_days: Days before compression (default: 7)"
echo ""
echo "Example: $0 myapp_db 180 14"
exit 1
}
# Check if help requested
if [[ "${1:-}" == "-h" || "${1:-}" == "--help" ]]; then
show_usage
fi
# Logging functions
log_info() {
echo -e "${GREEN}[INFO]${NC} $1"
}
log_warn() {
echo -e "${YELLOW}[WARN]${NC} $1"
}
log_error() {
echo -e "${RED}[ERROR]${NC} $1"
}
# Cleanup function
cleanup() {
log_error "Installation failed. Cleaning up..."
# Remove any temporary files if created
rm -f /tmp/timescaledb_install.log
}
# Set trap for cleanup on error
trap cleanup ERR
# Check if running as root
if [[ $EUID -ne 0 ]]; then
log_error "This script must be run as root"
exit 1
fi
# Detect distribution
if [ -f /etc/os-release ]; then
. /etc/os-release
case "$ID" in
ubuntu|debian)
PKG_MGR="apt"
PKG_INSTALL="apt install -y"
PKG_UPDATE="apt update"
POSTGRES_SERVICE="postgresql"
;;
almalinux|rocky|centos|rhel|ol|fedora)
PKG_MGR="dnf"
PKG_INSTALL="dnf install -y"
PKG_UPDATE="dnf update -y"
POSTGRES_SERVICE="postgresql"
;;
amzn)
PKG_MGR="yum"
PKG_INSTALL="yum install -y"
PKG_UPDATE="yum update -y"
POSTGRES_SERVICE="postgresql"
;;
*)
log_error "Unsupported distribution: $ID"
exit 1
;;
esac
else
log_error "Cannot detect distribution"
exit 1
fi
log_info "Detected distribution: $PRETTY_NAME"
log_info "Using package manager: $PKG_MGR"
echo "[1/8] Updating package manager and installing prerequisites..."
$PKG_UPDATE
$PKG_INSTALL curl wget gnupg2 software-properties-common
echo "[2/8] Installing PostgreSQL and TimescaleDB..."
if [[ "$PKG_MGR" == "apt" ]]; then
# Add PostgreSQL official repository
wget --quiet -O - https://www.postgresql.org/media/keys/ACCC4CF8.asc | apt-key add -
echo "deb http://apt.postgresql.org/pub/repos/apt/ $(lsb_release -cs)-pgdg main" > /etc/apt/sources.list.d/pgdg.list
# Add TimescaleDB repository
echo "deb https://packagecloud.io/timescale/timescaledb/ubuntu/ $(lsb_release -c -s) main" > /etc/apt/sources.list.d/timescaledb.list
wget --quiet -O - https://packagecloud.io/timescale/timescaledb/gpgkey | apt-key add -
$PKG_UPDATE
$PKG_INSTALL postgresql-16 timescaledb-2-postgresql-16
else
# RHEL-based systems
$PKG_INSTALL https://download.postgresql.org/pub/repos/yum/reporpms/EL-$(rpm -E %{rhel})-x86_64/pgdg-redhat-repo-latest.noarch.rpm
# Add TimescaleDB repository
cat > /etc/yum.repos.d/timescale_timescaledb.repo << 'EOF'
[timescale_timescaledb]
name=timescale_timescaledb
baseurl=https://packagecloud.io/timescale/timescaledb/el/8/$basearch
repo_gpgcheck=1
gpgcheck=0
enabled=1
gpgkey=https://packagecloud.io/timescale/timescaledb/gpgkey
sslverify=1
sslcacert=/etc/pki/tls/certs/ca-bundle.crt
metadata_expire=300
EOF
$PKG_INSTALL postgresql16-server timescaledb_16
# Initialize PostgreSQL on RHEL-based systems
/usr/pgsql-16/bin/postgresql-16-setup initdb
fi
echo "[3/8] Starting and enabling PostgreSQL service..."
systemctl enable $POSTGRES_SERVICE
systemctl start $POSTGRES_SERVICE
# Wait for PostgreSQL to start
sleep 5
echo "[4/8] Creating database and enabling TimescaleDB extension..."
# Create database if it doesn't exist
sudo -u postgres createdb "$DATABASE_NAME" 2>/dev/null || log_warn "Database $DATABASE_NAME may already exist"
# Enable TimescaleDB extension
sudo -u postgres psql -d "$DATABASE_NAME" -c "CREATE EXTENSION IF NOT EXISTS timescaledb;"
echo "[5/8] Creating sample hypertable for demonstration..."
sudo -u postgres psql -d "$DATABASE_NAME" << EOF
-- Create sensor data table
CREATE TABLE IF NOT EXISTS sensor_data (
time TIMESTAMPTZ NOT NULL,
sensor_id INTEGER,
temperature DOUBLE PRECISION,
humidity DOUBLE PRECISION
);
-- Convert to hypertable if not already
SELECT create_hypertable('sensor_data', 'time', chunk_time_interval => INTERVAL '1 day', if_not_exists => TRUE);
-- Create additional sample tables
CREATE TABLE IF NOT EXISTS critical_metrics (
time TIMESTAMPTZ NOT NULL,
metric_name TEXT,
value DOUBLE PRECISION
);
CREATE TABLE IF NOT EXISTS app_logs (
time TIMESTAMPTZ NOT NULL,
log_level TEXT,
message TEXT
);
CREATE TABLE IF NOT EXISTS debug_traces (
time TIMESTAMPTZ NOT NULL,
trace_id TEXT,
span_data JSONB
);
-- Convert to hypertables
SELECT create_hypertable('critical_metrics', 'time', chunk_time_interval => INTERVAL '1 week', if_not_exists => TRUE);
SELECT create_hypertable('app_logs', 'time', chunk_time_interval => INTERVAL '1 day', if_not_exists => TRUE);
SELECT create_hypertable('debug_traces', 'time', chunk_time_interval => INTERVAL '1 day', if_not_exists => TRUE);
EOF
echo "[6/8] Inserting sample data..."
sudo -u postgres psql -d "$DATABASE_NAME" << EOF
-- Insert sample data for sensor_data
INSERT INTO sensor_data
SELECT
generate_series(
NOW() - INTERVAL '6 months',
NOW(),
INTERVAL '1 hour'
) as time,
(random() * 100)::INTEGER as sensor_id,
(random() * 40 + 10) as temperature,
(random() * 100) as humidity
ON CONFLICT DO NOTHING;
-- Insert sample data for other tables
INSERT INTO critical_metrics
SELECT
generate_series(NOW() - INTERVAL '3 months', NOW(), INTERVAL '5 minutes') as time,
'cpu_usage' as metric_name,
random() * 100 as value
ON CONFLICT DO NOTHING;
INSERT INTO app_logs
SELECT
generate_series(NOW() - INTERVAL '2 months', NOW(), INTERVAL '1 minute') as time,
CASE WHEN random() < 0.1 THEN 'ERROR' ELSE 'INFO' END as log_level,
'Sample log message' as message
ON CONFLICT DO NOTHING;
EOF
echo "[7/8] Configuring retention and compression policies..."
sudo -u postgres psql -d "$DATABASE_NAME" << EOF
-- Remove existing policies to avoid conflicts
SELECT remove_retention_policy('sensor_data', if_exists => true);
SELECT remove_compression_policy('sensor_data', if_exists => true);
SELECT remove_retention_policy('critical_metrics', if_exists => true);
SELECT remove_retention_policy('app_logs', if_exists => true);
SELECT remove_retention_policy('debug_traces', if_exists => true);
-- Configure compression for sensor_data
ALTER TABLE sensor_data SET (
timescaledb.compress,
timescaledb.compress_segmentby = 'sensor_id',
timescaledb.compress_orderby = 'time'
);
-- Add retention policies
SELECT add_retention_policy('sensor_data', INTERVAL '$RETENTION_DAYS days');
SELECT add_retention_policy('critical_metrics', INTERVAL '2 years');
SELECT add_retention_policy('app_logs', INTERVAL '30 days');
SELECT add_retention_policy('debug_traces', INTERVAL '7 days');
-- Add compression policy
SELECT add_compression_policy('sensor_data', INTERVAL '$COMPRESSION_DAYS days');
-- Create custom retention function
CREATE OR REPLACE FUNCTION custom_retention_filter()
RETURNS VOID AS \$\$
BEGIN
-- Delete error-free logs older than 30 days
DELETE FROM app_logs
WHERE time < NOW() - INTERVAL '30 days'
AND log_level != 'ERROR';
-- Keep error logs for 90 days
DELETE FROM app_logs
WHERE time < NOW() - INTERVAL '90 days'
AND log_level = 'ERROR';
END;
\$\$ LANGUAGE plpgsql;
EOF
echo "[8/8] Verifying installation and configuration..."
# Check if TimescaleDB extension is installed
EXTENSION_CHECK=$(sudo -u postgres psql -d "$DATABASE_NAME" -t -c "SELECT COUNT(*) FROM pg_extension WHERE extname = 'timescaledb';")
if [[ "$EXTENSION_CHECK" -eq 1 ]]; then
log_info "✓ TimescaleDB extension is installed"
else
log_error "✗ TimescaleDB extension installation failed"
exit 1
fi
# Check hypertables
HYPERTABLE_COUNT=$(sudo -u postgres psql -d "$DATABASE_NAME" -t -c "SELECT COUNT(*) FROM timescaledb_information.hypertables;")
if [[ "$HYPERTABLE_COUNT" -ge 1 ]]; then
log_info "✓ Hypertables created successfully ($HYPERTABLE_COUNT tables)"
else
log_error "✗ Hypertable creation failed"
exit 1
fi
# Check retention policies
RETENTION_COUNT=$(sudo -u postgres psql -d "$DATABASE_NAME" -t -c "SELECT COUNT(*) FROM timescaledb_information.jobs WHERE proc_name = 'policy_retention';")
if [[ "$RETENTION_COUNT" -ge 1 ]]; then
log_info "✓ Retention policies configured ($RETENTION_COUNT policies)"
else
log_warn "⚠ No retention policies found"
fi
# Check compression policies
COMPRESSION_COUNT=$(sudo -u postgres psql -d "$DATABASE_NAME" -t -c "SELECT COUNT(*) FROM timescaledb_information.jobs WHERE proc_name = 'policy_compression';")
if [[ "$COMPRESSION_COUNT" -ge 1 ]]; then
log_info "✓ Compression policies configured ($COMPRESSION_COUNT policies)"
else
log_warn "⚠ No compression policies found"
fi
log_info "TimescaleDB installation and configuration completed successfully!"
log_info "Database: $DATABASE_NAME"
log_info "Retention period: $RETENTION_DAYS days"
log_info "Compression after: $COMPRESSION_DAYS days"
log_info ""
log_info "To connect to your database:"
log_info "sudo -u postgres psql -d $DATABASE_NAME"
log_info ""
log_info "To view retention jobs:"
log_info "SELECT * FROM timescaledb_information.jobs WHERE proc_name = 'policy_retention';"
Review the script before running. Execute with: bash install.sh