Files

T

Joachim Friberg 0aabfc8a72 Add llama-server and open-webui apps for local LLM inference

- llama-server: llama.cpp REST API server, 8G memory, port 8080
- open-webui: Chat UI connecting to llama-server, 2G memory, port 3000
- Both include x-casaos metadata for ZimaOS app store
- README with model download instructions and API examples

2026-04-19 22:25:22 +02:00

docker-compose.yaml

Add llama-server and open-webui apps for local LLM inference

2026-04-19 22:25:22 +02:00

README.md

Add llama-server and open-webui apps for local LLM inference

2026-04-19 22:25:22 +02:00

README.md

OpenWebUI

Modern chat web interface for local LLMs. Connects to llama-server via Docker internal networking.

Purpose

Port: 3000 (TCP)
Memory: 2G reservation
Category: AI / LLM UI

Requires the llama-server app to be running first. Connects to http://llama-server:8080 internally.

Prerequisites

Deploy and start llama-server app first
Download a GGUF model into llama-server's /models directory
Ensure llama-server container is healthy

Access

Open in browser:

http://<your-zimaos-host>:3000

First run may take a moment to initialize.

Environment Variables

Variable	Default	Description
`OLLAMA_BASE_URL`	`http://llama-server:8080`	Internal URL to llama-server API
`WEBUI_PORT`	`3000`	Container listen port
`TZ`	`Europe/Stockholm`	Timezone

If Connection Fails

Verify llama-server is running: docker ps | grep llama-server
Check llama-server logs: docker logs llama-server
Ensure llama-server MODEL env matches your downloaded file

From ZimaOS shell, test connectivity:

curl http://llama-server:8080/v1/models

Volumes

Path	Description
`/app/backend/data`	OpenWebUI persistent data (chat history, settings)

Architecture

amd64 (Intel/AMD x86_64)
arm64 (Apple Silicon, ARM servers)

Security

security_opt: no-new-privileges:true
cap_drop: ALL

Troubleshooting

"Cannot connect to LLM" error in UI

Verify llama-server is running before open-webui
Check that OLLAMA_BASE_URL is set to http://llama-server:8080
Verify model file exists in /DATA/AppData/llama-server/models/

Slow responses

7B models on CPU are limited by single-thread performance
3B models recommended for interactive speeds (~15+ tok/s)
Close other apps to free RAM