Connect Any OpenAI-Compatible LLM to Home Assistant Voice
This guide shows how to use a powerful LLM (like Claude, GPT-4, or any OpenAI-compatible API) as your Home Assistant voice assistant brain, replacing local models like Ollama/Llama.
Why?
– Smarter responses — Cloud LLMs understand context better than small local models
– Fast device control — Proxy handles common commands instantly without LLM roundtrip
– Best of both worlds — Quick local responses for home control, powerful LLM for complex questions
Architecture
Wake Word → Whisper STT → Ollama Proxy → Your LLM API → Piper TTS
↓
(Fast path for device control, weather, time queries)
`Prerequisites
– Home Assistant with voice pipeline set up (Wyoming protocol)
– Whisper (faster-whisper) for speech-to-text
– Piper for text-to-speech
– OpenWakeWord for wake word detection
– Python 3.10+ on a server (can be same machine as HA or separate)
– An OpenAI-compatible API endpoint (OpenAI, Claude via proxy, local LLM with OpenAI API, etc.)
Step 1: Create the Ollama Proxy
This Python script makes your LLM look like an Ollama server to Home Assistant.
Create `ollama-proxy.py`:
```python
#!/usr/bin/env python3
Ollama API Proxy – Makes any OpenAI-compatible LLM look like Ollama to Home Assistant.
import json
import re
import requests
import urllib3
urllib3.disable_warnings(urllib3.exceptions.InsecureRequestWarning)
from flask import Flask, request, jsonify
app = Flask(__name__)
============== CONFIGURATION ==============
Your LLM API endpoint (OpenAI-compatible)
LLM_URL = “https://api.openai.com/v1/chat/completions” # Or your endpoint
LLM_TOKEN = “your-api-key-here”
LLM_MODEL = “gpt-4” # Or claude-3-opus, etc.
Home Assistant API (for device control)
HA_URL = “https://homeassistant.local:8123”
HA_TOKEN = “your-long-lived-access-token”
System prompt for voice responses
SYSTEM_PROMPT = “””You are a voice assistant for Home Assistant. Keep responses concise and conversational – this is voice, not text.
Important: Respond in 1-2 sentences max. Be helpful and natural.”””
============== DEVICE MAPPINGS ==============
Customize these for YOUR home
LIGHTS = {
“living room”: “light.living_room”,
“kitchen”: “light.kitchen”,
“bedroom”: “light.bedroom”,
# Add your lights here
}
SWITCHES = {
“fan”: “switch.fan”,
# Add your switches here
}
COVERS = {
“garage”: “cover.garage_door”,
“blinds”: “cover.blinds”,
}
CLIMATE = {
“thermostat”: “climate.thermostat”,
}
COLORS = {
“white”: {“color_temp_kelvin”: 4000},
“warm”: {“color_temp_kelvin”: 2700},
“red”: {“hs_color”: [0, 100]},
“green”: {“hs_color”: [120, 100]},
“blue”: {“hs_color”: [240, 100]},
}
============== HELPER FUNCTIONS ==============
def call_ha_service(domain, service, data):
“””Call a Home Assistant service.”””
try:
resp = requests.post(
f”{HA_URL}/api/services/{domain}/{service}”,
json=data,
headers={“Authorization”: f”Bearer {HA_TOKEN}”},
verify=False,
timeout=10
)
return resp.status_code == 200
except:
return False
def get_weather(location):
“””Get weather from wttr.in (no API key needed).”””
try:
location = location.strip().replace(” “, “+”)
resp = requests.get(f”https://wttr.in/{location}?format=%C+%t”, timeout=5)
if resp.status_code == 200:
return f”Weather in {location.replace(‘+’, ‘ ‘)}: {resp.text.strip()}”
except:
pass
return None
def find_entity(text, device_map):
“””Find matching entity from device map.”””
text_lower = text.lower()
for name in sorted(device_map.keys(), key=len, reverse=True):
if name in text_lower:
return name, device_map[name]
return None, None
============== FAST PATH HANDLERS ==============
def handle_time(text):
“””Handle time queries instantly.”””
text_lower = text.lower()
if “what time” in text_lower:
from datetime import datetime
return datetime.now().strftime(“It’s %I:%M %p.”)
if “what day” in text_lower or “what’s the date” in text_lower:
from datetime import datetime
return datetime.now().strftime(“It’s %A, %B %d.”)
return None
def handle_weather(text):
“””Handle weather queries.”””
text_lower = text.lower()
patterns = [
r”weather\s+(?:in|for|at)\s+(.+)”,
r”what(?:’s| is)\s+(?:the\s+)?weather\s+(?:in|for|at|like in)\s+(.+)”,
]
for pattern in patterns:
match = re.search(pattern, text_lower)
if match:
location = match.group(1).strip().rstrip(“?.,!”)
return get_weather(location)
return None
def handle_lights(text):
“””Handle light commands.”””
text_lower = text.lower()
name, entity = find_entity(text, LIGHTS)
if not entity:
return None
params = {“entity_id”: entity}
# Check for color
for color_name, color_data in COLORS.items():
if color_name in text_lower:
params.update(color_data)
if call_ha_service(“light”, “turn_on”, params):
return f”{name.title()} lights set to {color_name}.”
# Check for brightness
brightness_match = re.search(r”(\d+)\s*%”, text_lower)
if brightness_match:
params[“brightness_pct”] = int(brightness_match.group(1))
if call_ha_service(“light”, “turn_on”, params):
return f”{name.title()} lights set to {params[‘brightness_pct’]}%.”
# On/off
if any(w in text_lower for w in [“turn off”, “off”]):
if call_ha_service(“light”, “turn_off”, params):
return f”{name.title()} lights off.”
elif any(w in text_lower for w in [“turn on”, “on”]):
if call_ha_service(“light”, “turn_on”, params):
return f”{name.title()} lights on.”
return None
def handle_thermostat(text):
“””Handle thermostat commands.”””
text_lower = text.lower()
name, entity = find_entity(text, CLIMATE)
if not entity:
return None
temp_match = re.search(r”(\d+)\s*(?:degrees|°)?”, text_lower)
if temp_match:
temp = int(temp_match.group(1))
if call_ha_service(“climate”, “set_temperature”, {“entity_id”: entity, “temperature”: temp}):
return f”Thermostat set to {temp} degrees.”
return None
def handle_covers(text):
“””Handle cover commands (garage, blinds, etc.).”””
text_lower = text.lower()
name, entity = find_entity(text, COVERS)
if not entity:
return None
params = {“entity_id”: entity}
if “open” in text_lower:
if call_ha_service(“cover”, “open_cover”, params):
return f”Opening {name}.”
elif “close” in text_lower:
if call_ha_service(“cover”, “close_cover”, params):
return f”Closing {name}.”
return None
============== MAIN HANDLER ==============
def handle_command(text):
“””Try fast-path handlers first, then fall back to LLM.”””
Try each fast handler
for handler in [handle_time, handle_weather, handle_lights, handle_thermostat, handle_covers]:
result = handler(text)
if result:
return result
Fall back to LLM
try:
resp = requests.post(
LLM_URL,
json={
"model": LLM_MODEL,
"messages": [
{"role": "system", "content": SYSTEM_PROMPT},
{"role": "user", "content": text}
]
},
headers={
"Authorization": f"Bearer {LLM_TOKEN}",
"Content-Type": "application/json"
},
timeout=120
)
if resp.status_code == 200:
return resp.json()["choices"][0]["message"]["content"]
except Exception as e:
print(f"LLM error: {e}")
return "Sorry, I couldn't process that."
============== OLLAMA API ROUTES ==============
@app.route(“/api/chat”, methods=[“POST”])
def chat():
“””Handle Ollama chat requests from Home Assistant.”””
data = request.json
messages = data.get(“messages”, [])
if messages:
user_message = messages[-1].get(“content”, “”)
response = handle_command(user_message)
return jsonify({
“model”: “assistant”,
“created_at”: “”,
“message”: {“role”: “assistant”, “content”: response},
“done”: True
})
return jsonify({“error”: “No message provided”}), 400
@app.route(“/api/tags”, methods=[“GET”])
def tags():
“””Return available models (Ollama compatibility).”””
return jsonify({
“models”: [{
“name”: “assistant:latest”,
“model”: “assistant:latest”,
“modified_at”: “2024-01-01T00:00:00Z”,
“size”: 4661235994,
“digest”: “abc123”,
“details”: {
“format”: “gguf”,
“family”: “llama”,
“parameter_size”: “8B”,
“quantization_level”: “Q4_0”
}
}]
})
@app.route(“/api/version”, methods=[“GET”])
def version():
return jsonify({“version”: “0.1.0”})
@app.route(“/”, methods=[“GET”])
def health():
return “Ollama Proxy OK”
if __name__ == “__main__”:
print(“Starting Ollama Proxy on port 11435…”)
app.run(host=”0.0.0.0″, port=11435)
“Step 2: Install Dependencies and Run
“`bash
# Install dependencies
pip install flask requests
# Run the proxy
python ollama-proxy.py
Step 3: Create a Systemd Service (Optional)
Create `/etc/systemd/system/ollama-proxy.service`:
“`ini
[Unit]
Description=Ollama LLM Proxy
After=network.target
[Service]
Type=simple
User=your-username
WorkingDirectory=/path/to/script
ExecStart=/usr/bin/python3 /path/to/ollama-proxy.py
Restart=always
RestartSec=5
[Install]
WantedBy=multi-user.target
“`
Enable and start:
“`bash
sudo systemctl daemon-reload
sudo systemctl enable --now ollama-proxy
“Step 4: Configure Home Assistant
Add the Proxy as an Ollama Service
1. Go to **Settings → Devices & Services → Add Integration**
2. Search for **Ollama**
3. Enter the URL: `http://YOUR_SERVER_IP:11435`
4. Click Submit
Create a Conversation Agent
1. Go to the new Ollama integration
2. Click Add conversation agent
3. Select model: `assistant:latest`
4. Uncheck “Prefer handling commands locally”
5. Save
Configure Voice Assistant
1. Go to Settings → Voice assistants
2. Edit your assistant (or create new)
3. Set Conversation agent to your new Ollama agent
4. Ensure STT is set to Whisper and TTS to Piper
Point Your Voice Device to the Assistant
For HA Voice devices or Wyoming Satellites:
1. Find the device’s Assistant selector entity
2. Set it to your new voice assistant
Step 5: Test
Say your wake word, then:
– “Turn on the living room lights”
– “What’s the weather in Seattle?”
– “Set the thermostat to 72”
– “What time is it?”
– “Tell me a joke” (goes to LLM)
Customization
Add More Devices
Edit the device mappings in the script:
“`python
LIGHTS = {
“living room”: “light.living_room”,
“kitchen”: “light.kitchen”,
“garage”: “light.garage”,
# Add yours
}
“
Add More Fast Handlers
Create new handler functions for device types you use frequently:
“`python
def handle_music(text):
if “play music” in text.lower():
call_ha_service(“media_player”, “media_play”, {“entity_id”: “media_player.speaker”})
return “Playing music.”
return None
“`
Add to the handler list in `handle_command()`.
Adjust LLM Timeout
If responses are slow, the HA voice pipeline may timeout. Options:
– Increase fast-path coverage for common commands
– Use a faster LLM model
– Adjust HA’s timeout settings (if available)
Troubleshooting
“No such entity” errors
– Check device mappings match your actual HA entity IDs
– Verify HA_TOKEN has permission to control devices
Proxy not responding
– Check firewall allows port 11435
– Verify proxy is running: `curl http://localhost:11435/api/tags`
Voice assistant times out
– Add more fast-path handlers for common queries
– Check LLM API latency
Wake word not detected
– Check OpenWakeWord is running
– Verify wake word model is loaded
– Adjust microphone sensitivity
This approach was developed for https://github.com/clawdbot/clawdbot, a personal AI assistant framework. The proxy pattern works with any OpenAI-compatible API.











