Self-hosted AI infrastructure is only as secure as the server it runs on. These services start fast and often bind to all interfaces by default. This guide covers hardening from day one: firewall, API auth, rate limiting, network isolation, and AI-specific threats.
Threat Model
External: Unauthorized API access (GPU time theft), prompt injection attacks, data exfiltration, DDoS exhausting GPU memory. Internal: Lateral movement, credential stuffing, accidental exposure. AI-specific: Jailbreaking, data leakage through prompts, toxic output generation.
Step 1 — UFW Firewall
sudo ufw default deny incoming sudo ufw default allow outgoing sudo ufw allow 22/tcp comment "SSH" sudo ufw allow from 192.168.1.0/24 to any port 11434 comment "Ollama LAN" sudo ufw allow from 192.168.1.0/24 to any port 8080 comment "LocalAI LAN" sudo ufw allow 443/tcp comment "HTTPS" sudo ufw deny 11434 sudo ufw deny 8080 sudo ufw enable
Step 2 — Ollama Hardening
Default bind is 127.0.0.1 — already secure. For network access, use Unix socket instead of TCP:
# Edit systemd: Environment="OLLAMA_HOST=unix:///var/run/ollama.sock" sudo systemctl restart ollama
Step 3 — API Key Authentication
nginx + htpasswd for LocalAI/vLLM:
sudo apt install apache2-utils sudo htpasswd -c /etc/nginx/.htpasswd ollama_user # Add to nginx: auth_basic "AI Access"; auth_basic_user_file /etc/nginx/.htpasswd;
vLLM bearer token: Client must send Authorization: Bearer YOUR_TOKEN header.
Step 4 — Rate Limiting
limit_req_zone $binary_remote_addr zone=ai_api:10m rate=10r/s; limit_req_zone $binary_remote_addr zone=ai_chat:10m rate=5r/m;
Step 5 — Docker Network Isolation
docker network create --driver bridge ai-network docker run -d --name ollama --network ai-network ollama/ollama docker run -d --name openwebui --network ai-network -p 8080:8080 -e OLLAMA_BASE_URL=http://ollama:11434 ghcr.io/open-webui/open-webui:main
Step 6 — Fail2Ban
sudo apt install fail2ban sudo systemctl enable fail2ban # Configure in /etc/fail2ban/jail.local for SSH and nginx-http-auth
Step 7 — GPU Memory Limits
Prevent resource exhaustion: MemoryMax=32G in systemd service for AI services.
Step 8 — Prompt Injection Defense
import re
def sanitize_prompt(user_input):
dangerous = [r"ignore previous", r"disregard your", r"system prompt:"]
for p in dangerous:
user_input = re.sub(p, "[filtered]", user_input, flags=re.IGNORECASE)
return user_input[:4096]Security Checklist
- UFW enabled, default deny incoming
- AI services on localhost or behind firewall
- API key auth on all network-accessible endpoints
- Rate limiting configured
- Fail2Ban running
- Docker networks isolating services
- Systemd memory/CPU limits
- Monitoring for GPU utilization anomalies