LocalAI is an API gateway that sits in front of your local AI models and exposes an OpenAI-compatible REST API. If you have multiple LLM backends running, LocalAI unifies them under a single endpoint. Write code against the OpenAI SDK, then switch to local models with a single line change.
The Problem It Solves
Normally switching from OpenAI to a local model means rewriting your entire API call layer. LocalAI eliminates this — point your base_url at LocalAI and your existing code works unchanged.
Installation
# Binary (simplest) curl -s https://raw.githubusercontent.com/mudler/LocalAI/master/hack/localai.sh | bash # Docker (recommended) docker pull quay.io/go-skynet/local-ai:latest docker run -d --name localai -p 8080:8080 -v $(pwd)/models:/models quay.io/go-skynet/local-ai:latest
API Usage
LocalAI supports the full OpenAI API surface:
# Chat completions
curl http://localhost:8080/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{"model": "llama3", "messages": [{"role": "user", "content": "Hello"}]}'
# Embeddings
curl http://localhost:8080/v1/embeddings \
-H "Content-Type: application/json" \
-d '{"model": "nomic-embed-text", "input": "Hello world"}'Drop-In OpenAI Replacement
from openai import OpenAI client = OpenAI(api_key="not-needed", base_url="http://localhost:8080/v1") # Everything else stays exactly the same! response = client.chat.completions.create(model="llama3", messages=[...]) print(response.choices[0].message.content)
Multiple Backends
Configure in config.yaml:
preload_models:
- name: "llama3"
backend: ollama
ollama_base_url: http://127.0.0.1:11434
- name: "mistral-7b"
backend: llama.cpp
model_file: mistral-7b-instruct.Q4_K_M.ggufSystemd Service
sudo cat > /etc/systemd/system/localai.service << EOF [Unit]Description=LocalAI ServiceAfter=network.target[Service]Type=simpleExecStart=/usr/local/bin/localai --config-path /etc/localai/config.yamlRestart=always[Install]WantedBy=multi-user.targetEOF
Troubleshooting
Connection refused: Check sudo systemctl status localai
Model not found: Verify with curl http://localhost:8080/v1/models