Merge branch 'main' into llama-server+open-webui/initial/local-llm-inference

This commit is contained in:
Joachim Friberg
2026-04-21 17:56:12 +02:00
19 changed files with 1611 additions and 15 deletions
+110
View File
@@ -0,0 +1,110 @@
# Plan: Add Snacks app to zima-apps
## Context
`apps.md` lists "Snacks" (https://github.com/derekshreds/snacks) as a pending app. It is an automated video library encoder with hardware acceleration (NVENC, QSV, VAAPI, AMF).
This plan has two parts:
1. Create the Snacks app definition
2. Update `apps.md` with agent instructions for future additions
---
## Part 1: Create `Apps/snacks/`
### Steps
1. **Create `Apps/snacks/` directory** from `_template`
2. **`docker-compose.yaml`** — adapt upstream `deploy-compose.yml`:
| Field | Upstream value | ZimaOS target |
|---|---|---|
| `name` | n/a | `snacks` |
| `image` | `derekshreds/snacks-docker:latest` | **Pinned version** — fetch latest release tag from GitHub, verify manifest exists |
| `network_mode` | `host` | Keep `host` — required for cluster UDP broadcast discovery |
| `privileged` | `true` | Keep `true` — required for `/dev/dri` access on QNAP/ZimaOS |
| `devices` | `/dev/dri:/dev/dri` | Keep — VAAPI/QSV hardware acceleration |
| `ports` | none (host mode) | Add `6767:6767` for web UI |
| `volumes` | QNAP-specific paths | Parameterize as `/DATA/AppData/$AppID/...` |
| `environment` | QNAP-specific ffmpeg path | Use default ffmpeg path; make jellyfin-ffmpeg path configurable |
Security baseline (MUST):
- `security_opt: ["no-new-privileges:true"]`
- `cap_drop: ["ALL"]`
- `deploy.resources.reservations` set to appropriate value
High-risk settings that MUST be documented in README:
- `network_mode: host` — required for cluster UDP broadcast
- `privileged: true` — required for `/dev/dri` access
- Device mount `/dev/dri` — GPU acceleration
3. **`README.md`** — document:
- Purpose: automated video library encoder with hardware acceleration
- Port: 6767 (web UI)
- Volumes: media library, logs, config
- High-risk settings with justification, alternatives evaluated, and risks
- Hardware acceleration options (VAAPI, QSV, NVENC)
- Cluster mode (UDP broadcast requirement)
- Health check endpoint
4. **Image pinning**: Before merge, verify the image tag exists in Docker Hub registry (manifest check)
5. **Run validation**: `./scripts/validate-appstore.sh`
6. **Optional**: `HOW_TO_VERIFY.md` with integration test cases
### Risk Assessment
- **High risk** due to `network_mode: host`, `privileged: true`, and device mounts
- Must document all three in README per AGENTS.md §3
- Image must be pinned — no `:latest`
### Branch name
`snacks/initial/add-video-encoder`
---
## Part 2: Update `apps.md`
### Changes
Replace current content with a table format that includes:
- Done/pending checkbox
- App name
- Source URL
- Brief description
- Agent instructions column (how to pick up this item)
The file should serve as an agent-facing backlog — clear enough that an agent can read it, understand what is needed, and execute without additional prompting.
### Suggested format
```markdown
## Backlog
| # | Done | Name | Source | What | Agent instructions |
|---|---|---|---|---|---|
| 1 | [ ] | Snacks | https://github.com/derekshreds/snacks | Automated video library encoder | Pick up, follow AGENTS.md §9 workflow, branch `snacks/initial/add-video-encoder` |
## Adding a new app
1. Copy `Apps/_template/``Apps/<app-id>/`
2. Set `name` in compose (lowercase + hyphen only)
3. Pin image to explicit version/tag (no `:latest`)
4. Add `x-casaos` metadata
5. Write `README.md` with purpose, ports, volumes, and risk justifications
6. Validate: `./scripts/validate-appstore.sh`
7. Run final validation before release: `./scripts/validate-appstore.sh --enforce-risk-docs`
```
---
## Verification
- `docker-compose -f Apps/snacks/docker-compose.yaml config` passes (no syntax errors)
- No `:latest` references
- `x-casaos` metadata complete
- README documents all high-risk settings with justification
- `./scripts/validate-appstore.sh` reports `Validation OK`
+192
View File
@@ -0,0 +1,192 @@
# Plan: Local LLM Zima App (Intel NUC8)
## Context
- **Hardware**: Intel NUC8 i7, 16GB RAM, 500GB SSD
- **Goal**: Zima app for local LLM inference with web UI
- **Constraints**: Intel Iris GPU cannot be used for LLM offload; CPU-only inference
- **Decisions**:
- Include OpenWebUI (two-container solution)
- 8G memory reservation (allows 7B Q4 models)
- App name: `llama-server`
---
## Technology Decision
### vLLM — **REJECTED**
- Requires NVIDIA CUDA GPU
- Cannot run on Intel NUC
### llama.cpp (llama-server) — **SELECTED**
- CPU-only, AVX2/AVX512 optimized
- Built-in REST API server
- Minimal footprint, fast for quantized models
- Best fit for NUC8 constraints
### LocalAI — **BACKUP OPTION**
- More features (TTS, image gen, multi-model)
- Can backend to llama.cpp
- Heavier; only choose if extra features needed
### OpenWebUI — **RECOMMENDED COMPANION**
- Modern chat UI for LLM
- Docker-based, easy to deploy alongside
- Can be separate Zima app or documented companion
---
## Architecture: Two Zima Apps
```
┌─────────────────────────┐ ┌─────────────────────────┐
│ llama-server │ │ open-webui │
│ - REST API :8080 │────▶│ - Chat UI :3000 │
│ - Serves model │ │ - Connects to LLM API │
└─────────────────────────┘ └─────────────────────────┘
```
Both are separate Zima apps, deployed independently. OpenWebUI references `http://llama-server:8080` via Docker internal networking.
### App 1: `llama-server`
- Container: `ghcr.io/ggerganov/llama.cpp:server`
- Port: 8080
- Memory: 8G reservation
### App 2: `open-webui`
- Container: `ghcr.io/open-webui/open-webui:main`
- Port: 3000
- Memory: 2G reservation
- Environment: `OLLAMA_BASE_URL=http://llama-server:8080`
---
## App: `llama-server`
### Container: `ghcr.io/ggerganov/llama.cpp:server`
**Environment Variables**:
| Variable | Default | Description |
|----------|---------|-------------|
| `MODEL` | (required) | Model filename in `/models` |
| `CTX_SIZE` | 2048 | Context window size |
| `N_THREADS` | auto | CPU threads (auto = all) |
| `HOST` | 0.0.0.0 | Listen address |
| `PORT` | 8080 | API port |
| `MAX_TOKENS` | 512 | Max tokens to generate |
**Volumes**:
| Container | Description |
|-----------|-------------|
| `/models` | Model files (GGUF format) |
| `/DATA/AppData/$AppID/logs` | Server logs |
**Ports**:
| Container | Protocol | Description |
|-----------|----------|-------------|
| 8080 | TCP | llama.cpp REST API |
**Resources**:
- Memory reservation: **8G** (allows 7B Q4 models)
**Security**:
- `security_opt: no-new-privileges:true`
- `cap_drop: ALL`
- No privileged needed (CPU-only)
### Model Download (Documented in README)
Users download models manually:
```bash
# Example: Download Llama 3.2 3B Q4_K_M
curl -L -o /DATA/AppData/llama-server/models/llama-3.2-3b-q4_k_m.gguf \
"https://huggingface.co/QuantFactory/Llama-3.2-3B-Instruct-GGUF/resolve/main/Llama-3.2-3B-Instruct.Q4_K_M.gguf"
```
**Recommended Models for 16GB RAM**:
| Model | Size | Quant | RAM Needed | Speed (est) |
|-------|------|-------|------------|-------------|
| Llama 3.2 3B | 1.8GB | Q4_K_M | ~4GB | ~15-20 tok/s |
| Phi-3.5 Mini 3B | 1.8GB | Q4_K_M | ~4GB | ~15-20 tok/s |
| Mistral 7B | 4.1GB | Q4_K_M | ~6-7GB | ~8-12 tok/s |
| Qwen 2.5 7B | 4.4GB | Q4_K_M | ~6-7GB | ~8-12 tok/s |
---
## App: `open-webui`
### Container: `ghcr.io/open-webui/open-webui:main`
**Environment Variables**:
| Variable | Default | Description |
|----------|---------|-------------|
| `OLLAMA_BASE_URL` | http://llama-server:8080 | LLM API endpoint |
| `WEBUI_PORT` | 3000 | Web UI port |
**Ports**:
| Container | Protocol | Description |
|-----------|----------|-------------|
| 3000 | TCP | OpenWebUI |
**Resources**:
- Memory reservation: **2G**
**Notes**:
- Connects to `http://llama-server:8080` via Docker internal networking
- Requires `llama-server` app to be running first
---
## File Structure
```
Apps/llama-server/
├── docker-compose.yaml
├── README.md
└── HOW_TO_VERIFY.md (optional)
Apps/open-webui/
├── docker-compose.yaml
├── README.md
└── HOW_TO_VERIFY.md (optional)
```
---
## Implementation Steps
### llama-server
1. Create `Apps/llama-server/` directory
2. Write `docker-compose.yaml` with:
- Image: `ghcr.io/ggerganov/llama.cpp:server`
- 8G memory reservation
- Port 8080
- Model volume at `/models`
- Env vars: MODEL, CTX_SIZE, N_THREADS, HOST, PORT
3. Write `README.md` with:
- Model download instructions
- First-run setup
- API testing examples
- Performance tips for NUC8
4. Validate with `./scripts/validate-appstore.sh`
### open-webui
1. Create `Apps/open-webui/` directory
2. Write `docker-compose.yaml` with:
- Image: `ghcr.io/open-webui/open-webui:main`
- 2G memory reservation
- Port 3000
- Environment: `OLLAMA_BASE_URL=http://llama-server:8080`
3. Write `README.md` with:
- Prerequisites (llama-server must be running first)
- How to access
- Troubleshooting connection issues
4. Validate with `./scripts/validate-appstore.sh`
---
## Risk Assessment
| Risk | Level | Mitigation |
|------|-------|------------|
| NUC8 RAM insufficient for 7B with other apps | Medium | 8G reservation; close other apps for 7B |
| Model download issues | Low | Provide direct HF links in README |
| OpenWebUI API compatibility | Low | llama.cpp v1 API is OpenAI-compatible |
| Intel AVX2 performance | Low | llama.cpp auto-detects and uses AVX2 |
+93
View File
@@ -0,0 +1,93 @@
# Plan: Update AGENTS.md with Commit/Test/Build/Push Workflow
## Context
The repo's AGENTS.md (section 4 "Arbetsflöde för ändringar") currently only mentions `./scripts/validate-appstore.sh` as a loose recommendation. It lacks detailed guidance on the full development lifecycle: committing, testing images, building the appstore zip, and pushing.
## Goal
Add a new section to AGENTS.md (or expand section 4) covering the full workflow:
### A. Branch & Commit Workflow
- Branch naming per existing section 8
- Single-focus commits (one logical change per commit)
- Commit message format: short summary + bullet points for details
- What files can be committed (Apps/ scope rules from section 9)
### B. Image Verification (Pre-commit)
- Before committing compose changes, verify images are online:
```bash
docker manifest inspect <image:tag@sha256:...>
```
- Alternative: use build-appstore-zip.sh which does this automatically
### C. Local Validation
- Always run before push/PR:
```bash
./scripts/validate-appstore.sh
```
### D. Building the Appstore Package
- Script: `./scripts/build-appstore-zip.sh`
- Outputs to `dist/phirna-appstore.zip`
- Auto-generates SHA256 checksum
- Verifies all images online before building
- Commits `dist/` separately from app changes
### E. Push & Release
- Push order: app commits → build zip → commit zip → push
- PR description must include (per existing section 6):
- Affected app IDs
- Security risk level
- High-risk settings changes
## Proposed New Section (12) in AGENTS.md
```
## 12) Release- och publiceringsarbetsflöde
### Steg 1: Branch
Skapa branch enligt format i sektion 8:
<appnamn>/<initial|bugfix|update>/<detalj>
### Steg 2: Verifiera images (innan commit)
Kontrollera att alla Docker-images är tillgängliga online:
docker manifest inspect <image:tag@sha256:...>
### Steg 3: Validera lokalt
Kör validering innan commit:
./scripts/validate-appstore.sh
### Steg 4: Committa ändringar
- Små, reviewbara commits
- Separera appfiler från dist/-filer
- Commit-meddelande: rubrik + bulletpunkter
### Steg 5: Bygg appstore-zip
./scripts/build-appstore-zip.sh
- Skapar dist/phirna-appstore.zip
- Verifierar alla images
- Genererar SHA256
### Steg 6: Committa dist/
Separer commit för dist/ från appfiler:
git add dist/ && git commit -m "Build appstore zip"
### Steg 7: Push och PR
git push -u origin <branch>
Skapa PR med:
- Vilka app-id som påverkas
- Säkerhetsrisk (låg/medel/hög)
- Högrisk-inställningar vid introduktion
```
## Implementation
1. Read current AGENTS.md
2. Insert new section 12 after existing section 11
3. Renumber subsequent sections (12 → 13, etc.)
## Questions for User
- Should this be a new numbered section or expand existing section 4?
- Is `dist/phirna-appstore.zip` the correct output name for all repos, or should this be configurable?
+170
View File
@@ -0,0 +1,170 @@
# Plan: Gitea Bot User Setup for Tea CLI
## Context
Enable the agent (Kilo) to interact with Gitea (git.phirna.uk) via the `tea` CLI for:
- Creating branches
- Committing and pushing changes
- Creating pull requests
- Managing issues and labels
## Step 1: Username Suggestion
**Suggested username: `kilo-bot`**
While not directly Norse mythology, "Kilo" evokes the Norse root meaning "coal" or "torch". Alternatives if you prefer pure mythology:
| Username | Origin |
|----------|--------|
| `kilo-bot` | Kilo = "torch of life" from Old Norse "Kjöl" |
| `mimir-bot` | Mimir - Norse god of wisdom, keeper of knowledge |
| `hnir-bio` | Hnir - "breath" in Old Norse |
| `sowilo-bot` | Sowilo - the S rune, meaning "sun" |
**Recommendation**: `kilo-bot` — maintains brand consistency with the agent name "Kilo".
## Step 2: Required Permissions
Based on Gitea granular scopes, the bot needs:
| Scope | Reason |
|-------|--------|
| `write:repository` | Create branches, push commits, create PRs |
| `read:repository` | Read branches, commits, repos |
| `read:user` | Identify authenticated user |
| `write:issue` | Create/update issues if needed |
| `read:org` | Read org membership if needed |
**Alternative**: Use `write:repository, read:user` for minimal permissions covering all git operations.
**NOT needed**: `admin` (would allow deleting repos, managing orgs, etc.)
## Step 3: Create the Bot User
Requires admin access on git.phirna.uk. Steps:
1. Go to `https://git.phirna.uk/admin/users/new` (or use `tea admin`)
2. Create user `kilo-bot` with email (e.g., `kilo-bot@phirna.uk`)
3. Set a strong random password (store in password manager)
4. Optionally add to relevant organization(s)
## Step 4: Generate Access Token
1. Login as `kilo-bot`
2. Go to Settings → Applications → "Manage Access Tokens"
3. Create token with name `kilo-cli` and scopes:
- `repository:write`
- `user:read`
4. Copy the generated token securely
## Step 5: Configure Tea
```bash
tea logins add --name kilo-bot --url https://git.phirna.uk --token <generated-token>
```
Or set environment variable `GITEA_TOKEN` or configure in `~/.config/tea/config.yml`.
## Step 6: Update AGENTS.md
Add new section or subsection covering:
- Bot user credentials (token) storage approach
- Expected token scopes
- tea command patterns for common operations
- Security considerations (bot has limited scope)
## Step 7: Create Skill (optional but recommended)
Create `.kilo/.skills/gitea-agent.md`:
- Standardized tea commands for branch creation
- Commit/push workflow via tea
- PR creation commands
- Issue management shortcuts
## Security Considerations
- Bot should **NOT** be admin
- Token should be scoped to `write:repository` + `read:user` only
- Token stored in environment or secured config, NOT in repo
- Consider setting bot's `MaxTokenPermissions` at org level if supported
## Verification Steps
After setup, test:
```bash
# Verify identity
tea whoami --login kilo-bot
# List repos (should see assigned repos)
tea repos list --login kilo-bot
# Create a test branch
git checkout -b test/tea-cli-test
git push -u origin test/tea-cli-test --force
```
## Confirmed Decisions
| Decision | Choice |
|----------|--------|
| Username | `mimir` (already created on Gitea) |
| Token storage | Tea login system (`tea logins add`) |
| Access scope | All repos on git.phirna.uk |
## Implementation Steps
### Step 1: Generate Access Token for mimir
1. Login to `https://git.phirna.uk` as `mimir`
2. Go to Settings → Applications → "Manage Access Tokens"
3. Create token with name `kilo-cli` and scopes:
- `repository:write`
- `user:read`
4. Copy the generated token securely
### Step 2: Add Tea Login
```bash
tea logins add --name mimir --url https://git.phirna.uk --token <generated-token>
```
Verify with:
```bash
tea whoami --login mimir
```
### Step 3: Update AGENTS.md
Add new section under "Repo-invarianter" or create new section:
```
## X) Gitea Bot (mimir)
För att kunna skapa branches, commits och PRs via tea:
- Användare: `mimir` på git.phirna.uk
- Token lagras i tea's login system (`tea logins add`)
- Scopes: `repository:write`, `user:read`
- Säkerhet: mimir är inte admin, token har begränsade scopes
Exempel-kommandon:
tea pr create --login mimir --owner <owner> --repo <repo>
tea pulls list --login mimir --owner <owner> --repo <repo>
```
### Step 4: Create Skill (optional)
Create `.kilo/.skills/gitea-agent.md` with tea command patterns.
### Step 5: Verify Access
```bash
# Should show mimir's identity
tea whoami --login mimir
# Should list repos mimir can access
tea repos list --login mimir
```
+34
View File
@@ -129,6 +129,7 @@ Sektionen "data att samla" ska minst täcka:
- loggar från berörda containers,
- konkreta felobservationer (hostname, tidpunkt, förväntat vs faktiskt beteende).
<<<<<<< HEAD
## 11) Release- och publiceringsarbetsflöde
### Steg 1: Branch
@@ -182,3 +183,36 @@ PR ska inkludera:
- Vilka app-id som påverkas.
- Säkerhetsrisk (låg/medel/hög).
- Högrisk-inställningar vid introduktion eller förändring.
## 11) Gitea Bot (mimir)
För att kunna skapa branches, commits och PRs via tea-CLI:
- **Användare**: `mimir` på git.phirna.uk
- **Token**: Lagras i tea's login-system via `tea logins add`
- **Scopes**: `repository:write`, `user:read`
- **Säkerhet**: mimir är inte admin, token har begränsade scopes
### Vanliga kommandon
```bash
# Sätt aktiv login
export GITEA_LOGIN=mimir
# Lista repos
tea repos list --login mimir
# Skapa branch och push
git checkout -b <branch-name>
git push -u origin <branch-name>
# Skapa PR
tea pulls create --login mimir --owner <owner> --repo <repo> --head <branch> --base <target>
# Lista öppna PRs
tea pulls list --login mimir --owner <owner> --repo <repo>
# Hantera issues
tea issues list --login mimir --owner <owner> --repo <repo>
tea issues create --login mimir --owner <owner> --repo <repo> --title "Titel" --body "Body"
```
+44 -4
View File
@@ -67,7 +67,27 @@ Förväntat resultat:
- posten med `10.0.4.2` har `used=true`.
- `containers` innehåller `ip-verify-nginx`.
### Test C: Disable/Delete efter frigöring
### Test C: DNS create när posten är enabled + used
Förutsätter DNS config i appen, exempel för AdGuard:
- `DNS_PROVIDER=adguard`
- `DNS_BASE_DOMAIN=home.arpa`
- `ADGUARD_URL=http://<adguard-ip>:3000`
- `ADGUARD_USERNAME=<user>`
- `ADGUARD_PASSWORD=<password>`
Verifiera att record skapats:
```bash
dig +short lan-test.home.arpa @<adguard-ip>
```
Förväntat resultat:
- returnerar `10.0.4.2`.
### Test D: Disable/Delete efter frigöring
Stoppa testcontainer:
@@ -103,7 +123,7 @@ Förväntat resultat:
## 3) Negativt / fail-closed testfall
### Test D: Blockera disable när IP används
### Test E: Blockera disable när IP används
1. Skapa + enable som i Test A.
2. Starta container som i Test B.
@@ -120,19 +140,39 @@ Förväntat resultat:
- HTTP `409`.
- feltext som anger att posten används av container.
### Test F: Fail-closed vid DNS-fel
1. Se till att en post är `enabled` och `used` (Test A+B).
2. Sabotera DNS-auth tillfälligt, exempel:
- ändra `ADGUARD_PASSWORD` till fel värde och starta om appen.
3. Försök disable/delete på posten.
```bash
curl -sS -o /tmp/dns-fail.out -w '%{http_code}\n' \
-X POST "http://127.0.0.1:31810/api/entries/${ENTRY_ID}/disable"
cat /tmp/dns-fail.out
```
Förväntat resultat:
- HTTP `409` eller `503`.
- feltext som indikerar DNS-synkfel.
- posten ska inte lämna systemet i delvis uppdaterat läge.
## 4) DNS / nät / TLS verifiering
### DNS (om hostname används i LAN)
```bash
DNS_SERVER="<dns-server-ip>"
HOSTNAME_TO_TEST="<hostname-i-lan>"
HOSTNAME_TO_TEST="lan-test.home.arpa"
dig +short "${HOSTNAME_TO_TEST}" @"${DNS_SERVER}"
```
Förväntat resultat:
- returnerar avsedd LAN-IP.
- returnerar avsedd LAN-IP när posten är `enabled && used`.
- ingen träff när posten inte längre är `used` eller är `disabled`.
### Nätverk (lyssning och routning)
+37 -4
View File
@@ -13,12 +13,16 @@ Exempel: istället för att köra `ip addr add 10.0.4.2/16 dev eth0` via SSH, ka
- Sorterbar tabell: namn, IP-adress, used/unused, containernamn, device, enable/disable.
- Used/unused-kontroll via Docker API (`NetworkSettings.Ports`) med exakt `HostIp`-match.
- Include stopped containers i used-kontroll.
- DNS-livscykel (opt-in): skapar A-record när `enabled=true` och `used=true`, tar bort record när villkoret inte längre gäller.
- DNS-namn byggs från `name` + `DNS_BASE_DOMAIN` => `<name>.<base-domain>` (DNS-säkrad label).
- Fail-closed:
- disable blockeras om IP används av minst en container,
- delete blockeras om posten är enabled eller used,
- disable/delete blockeras om Docker-usage inte kan verifieras.
- disable/delete blockeras om Docker-usage inte kan verifieras,
- state-ändringar blockeras om nödvändig DNS-synk misslyckas.
- Startup reconcile: enabled-poster återappliceras vid appstart.
- Manuell refresh-knapp (ingen websocket i v1).
- DNS reconcile körs i bakgrunden med poll-interval.
- Manuell refresh-knapp för UI-status (ingen websocket i v1).
## Portar
@@ -77,6 +81,33 @@ Viktiga environment-variabler:
- alternativt `http://127.0.0.1:2375` för socket-proxy.
- `DOCKER_TIMEOUT_SECONDS` (default `3`)
- `STATE_FILE` (default `/data/entries.json`)
- `DNS_PROVIDER` (`none`, `adguard`, `rfc2136`; default `none`)
- `DNS_BASE_DOMAIN` (exempel: `home.arpa`)
- `DNS_TTL_SECONDS` (default `120`)
- `DNS_SYNC_INTERVAL_SECONDS` (default `15`)
AdGuard (`DNS_PROVIDER=adguard`):
- `ADGUARD_URL` (exempel: `http://127.0.0.1:3000`)
- `ADGUARD_USERNAME`
- `ADGUARD_PASSWORD`
- `ADGUARD_API_TOKEN` (framtida alternativ, inte aktiv auth-väg i v1)
RFC2136 (`DNS_PROVIDER=rfc2136`):
- `RFC2136_SERVER`
- `RFC2136_ZONE`
- `RFC2136_PORT` (default `53`)
- `RFC2136_TSIG_KEY_NAME` (valfri om osignerade updates tillåts)
- `RFC2136_TSIG_SECRET` (base64, valfri utan TSIG)
- `RFC2136_TSIG_ALGORITHM` (default `hmac-sha256`)
## DNS-beteende
- Villkor för record: endast när posten är `enabled` och `used`.
- När posten inte längre är `used` tas DNS-record bort i bakgrundsreconcile.
- Vid enable/disable/delete görs direkt DNS-synk och operationen failar vid synkfel (fail-closed).
- Om Docker usage-kontroll är okänd i bakgrundsloop görs inga DNS-mutationer i den cykeln.
## Integrationstester
@@ -91,7 +122,9 @@ Testerna mockar Docker API och `ip`-kommandoflöde och verifierar:
- exakt `HostIp`-matchning,
- fail-closed disable/delete,
- blockering vid enabled/used,
- startup reconcile av enabled-poster.
- startup reconcile av enabled-poster,
- DNS create/delete på `enabled && used`,
- fail-closed rollback vid DNS-synkfel.
## Auth-notis
@@ -102,4 +135,4 @@ Auth/autorisering ska implementeras i en senare version och är en uttalad roadm
## Roadmap (ej v1)
- WebSocket-baserad live-uppdatering av used-status.
- DNS-integration (Cloudflare/lokal DNS) kopplat till IP-poster och hostnamn.
- Alternativ auth för AdGuard via API-token.
@@ -10,6 +10,20 @@ class Settings:
docker_api_url: str
docker_timeout_seconds: float
app_port: int
dns_provider: str
dns_base_domain: str
dns_ttl_seconds: int
dns_sync_interval_seconds: float
adguard_url: str
adguard_username: str
adguard_password: str
adguard_api_token: str
rfc2136_server: str
rfc2136_zone: str
rfc2136_port: int
rfc2136_tsig_key_name: str
rfc2136_tsig_secret: str
rfc2136_tsig_algorithm: str
def get_settings() -> Settings:
@@ -18,4 +32,18 @@ def get_settings() -> Settings:
docker_api_url=os.getenv("DOCKER_API_URL", "unix:///var/run/docker.sock"),
docker_timeout_seconds=float(os.getenv("DOCKER_TIMEOUT_SECONDS", "3")),
app_port=int(os.getenv("APP_PORT", "31810")),
dns_provider=os.getenv("DNS_PROVIDER", "none"),
dns_base_domain=os.getenv("DNS_BASE_DOMAIN", ""),
dns_ttl_seconds=int(os.getenv("DNS_TTL_SECONDS", "120")),
dns_sync_interval_seconds=float(os.getenv("DNS_SYNC_INTERVAL_SECONDS", "15")),
adguard_url=os.getenv("ADGUARD_URL", ""),
adguard_username=os.getenv("ADGUARD_USERNAME", ""),
adguard_password=os.getenv("ADGUARD_PASSWORD", ""),
adguard_api_token=os.getenv("ADGUARD_API_TOKEN", ""),
rfc2136_server=os.getenv("RFC2136_SERVER", ""),
rfc2136_zone=os.getenv("RFC2136_ZONE", ""),
rfc2136_port=int(os.getenv("RFC2136_PORT", "53")),
rfc2136_tsig_key_name=os.getenv("RFC2136_TSIG_KEY_NAME", ""),
rfc2136_tsig_secret=os.getenv("RFC2136_TSIG_SECRET", ""),
rfc2136_tsig_algorithm=os.getenv("RFC2136_TSIG_ALGORITHM", "hmac-sha256"),
)
@@ -0,0 +1,309 @@
from __future__ import annotations
from dataclasses import dataclass
import base64
import http.client
import json
from typing import Protocol
from urllib.parse import urlparse
class DnsSyncError(RuntimeError):
pass
class DnsProvider(Protocol):
def upsert_a_record(self, fqdn: str, ip: str, ttl: int) -> None:
raise NotImplementedError
def delete_a_record(self, fqdn: str) -> None:
raise NotImplementedError
def to_fqdn(entry_name: str, base_domain: str) -> str:
label = _sanitize_label(entry_name)
domain = base_domain.strip().lower().strip(".")
if not domain:
raise DnsSyncError("DNS_BASE_DOMAIN is required when DNS is enabled")
return f"{label}.{domain}"
def _sanitize_label(value: str) -> str:
source = value.strip().lower()
if not source:
raise DnsSyncError("Entry name is required to create DNS record")
cleaned: list[str] = []
prev_dash = False
for ch in source:
if "a" <= ch <= "z" or "0" <= ch <= "9":
cleaned.append(ch)
prev_dash = False
continue
if ch in {" ", "_", "-"} and not prev_dash:
cleaned.append("-")
prev_dash = True
label = "".join(cleaned).strip("-")
if not label:
raise DnsSyncError(f"Entry name cannot produce DNS-safe label: {value!r}")
if len(label) > 63:
raise DnsSyncError("DNS label derived from entry name is too long (max 63)")
return label
@dataclass(frozen=True)
class AdguardConfig:
url: str
username: str
password: str
timeout_seconds: float
class AdguardDnsProvider:
def __init__(self, config: AdguardConfig):
parsed = urlparse(config.url)
if parsed.scheme not in {"http", "https"}:
raise ValueError("ADGUARD_URL must use http or https")
if not parsed.netloc:
raise ValueError("ADGUARD_URL must include host")
self._https = parsed.scheme == "https"
self._host = parsed.hostname or "localhost"
self._port = parsed.port
self._base_path = parsed.path.rstrip("/")
self._username = config.username
self._password = config.password
self._timeout = config.timeout_seconds
self._session_cookie: str | None = None
def upsert_a_record(self, fqdn: str, ip: str, ttl: int) -> None:
del ttl # AdGuard rewrite records do not expose TTL controls.
rewrites = self._list_rewrites()
for item in rewrites:
if item.get("domain") == fqdn and item.get("answer") == ip:
return
if item.get("domain") == fqdn and item.get("answer") != ip:
self._request("POST", "/control/rewrite/delete", {"domain": fqdn, "answer": item.get("answer", "")})
self._request("POST", "/control/rewrite/add", {"domain": fqdn, "answer": ip})
def delete_a_record(self, fqdn: str) -> None:
rewrites = self._list_rewrites()
for item in rewrites:
if item.get("domain") != fqdn:
continue
self._request("POST", "/control/rewrite/delete", {"domain": fqdn, "answer": item.get("answer", "")})
def _list_rewrites(self) -> list[dict]:
payload = self._request("GET", "/control/rewrite/list", None)
if not isinstance(payload, list):
raise DnsSyncError("AdGuard returned unexpected rewrite list format")
output: list[dict] = []
for item in payload:
if isinstance(item, dict):
output.append(item)
return output
def _request(self, method: str, path: str, payload: dict | None) -> object:
if self._session_cookie is None:
self._login()
return self._request_with_session(method, path, payload, retry_on_auth=True)
def _login(self) -> None:
body = {"name": self._username, "password": self._password}
payload, headers = self._raw_request("POST", "/control/login", body, include_auth=False)
if headers is None:
raise DnsSyncError("AdGuard login failed: missing response headers")
cookie = headers.get("set-cookie", "")
session = ""
for piece in cookie.split(";"):
piece = piece.strip()
if piece.startswith("agh_session="):
session = piece
break
if not session:
raise DnsSyncError("AdGuard login failed: no agh_session cookie")
self._session_cookie = session
del payload
def _request_with_session(self, method: str, path: str, payload: dict | None, retry_on_auth: bool) -> object:
body, _ = self._raw_request(method, path, payload, include_auth=True)
if isinstance(body, dict) and body.get("message") == "unauthorized":
if retry_on_auth:
self._session_cookie = None
self._login()
return self._request_with_session(method, path, payload, retry_on_auth=False)
raise DnsSyncError("AdGuard request unauthorized")
return body
def _raw_request(
self, method: str, path: str, payload: dict | None, include_auth: bool
) -> tuple[object, dict[str, str] | None]:
conn: http.client.HTTPConnection | http.client.HTTPSConnection
if self._https:
conn = http.client.HTTPSConnection(self._host, self._port, timeout=self._timeout)
else:
conn = http.client.HTTPConnection(self._host, self._port, timeout=self._timeout)
request_path = f"{self._base_path}{path}"
raw = ""
headers = {"Content-Type": "application/json"}
if include_auth and self._session_cookie:
headers["Cookie"] = self._session_cookie
if payload is not None:
raw = json.dumps(payload)
try:
conn.request(method, request_path, body=raw, headers=headers)
response = conn.getresponse()
body_text = response.read().decode("utf-8", errors="replace")
response_headers = {k.lower(): v for k, v in response.getheaders()}
except OSError as exc:
raise DnsSyncError(f"AdGuard request failed for {path}: {exc}") from exc
finally:
conn.close()
if response.status < 200 or response.status >= 300:
raise DnsSyncError(
f"AdGuard request failed for {path}: HTTP {response.status} {response.reason}; body={body_text[:400]}"
)
if not body_text.strip():
return {}, response_headers
try:
return json.loads(body_text), response_headers
except json.JSONDecodeError:
return body_text, response_headers
@dataclass(frozen=True)
class Rfc2136Config:
server: str
zone: str
port: int
timeout_seconds: float
tsig_key_name: str
tsig_secret: str
tsig_algorithm: str
class Rfc2136DnsProvider:
def __init__(self, config: Rfc2136Config):
if not config.server.strip():
raise ValueError("RFC2136_SERVER is required")
if not config.zone.strip():
raise ValueError("RFC2136_ZONE is required")
self._server = config.server.strip()
self._zone = config.zone.strip().rstrip(".")
self._port = config.port
self._timeout = config.timeout_seconds
self._key_name = config.tsig_key_name.strip()
self._secret = config.tsig_secret.strip()
self._algorithm = config.tsig_algorithm.strip() or "hmac-sha256"
def upsert_a_record(self, fqdn: str, ip: str, ttl: int) -> None:
rcode, tsigkeyring, update, query = self._dns_modules()
zone_text = self._zone_with_dot()
keyring = self._keyring_or_none(tsigkeyring)
target = self._absolute_name(fqdn)
try:
req = update.Update(zone_text, keyring=keyring, keyname=self._key_name or None, keyalgorithm=self._algorithm)
req.delete(target, "A")
req.add(target, int(ttl), "A", ip)
response = query.tcp(req, self._server, port=self._port, timeout=self._timeout)
except Exception as exc: # noqa: BLE001
raise DnsSyncError(f"RFC2136 upsert failed for {fqdn} -> {ip}: {exc}") from exc
if response.rcode() != rcode.NOERROR:
text = rcode.to_text(response.rcode())
raise DnsSyncError(f"RFC2136 upsert failed for {fqdn}: {text}")
def delete_a_record(self, fqdn: str) -> None:
rcode, tsigkeyring, update, query = self._dns_modules()
zone_text = self._zone_with_dot()
keyring = self._keyring_or_none(tsigkeyring)
target = self._absolute_name(fqdn)
try:
req = update.Update(zone_text, keyring=keyring, keyname=self._key_name or None, keyalgorithm=self._algorithm)
req.delete(target, "A")
response = query.tcp(req, self._server, port=self._port, timeout=self._timeout)
except Exception as exc: # noqa: BLE001
raise DnsSyncError(f"RFC2136 delete failed for {fqdn}: {exc}") from exc
if response.rcode() != rcode.NOERROR:
text = rcode.to_text(response.rcode())
raise DnsSyncError(f"RFC2136 delete failed for {fqdn}: {text}")
def _dns_modules(self):
try:
import dns.query as query
import dns.rcode as rcode
import dns.tsigkeyring as tsigkeyring
import dns.update as update
except ImportError as exc:
raise DnsSyncError("dnspython is required for RFC2136 mode") from exc
return rcode, tsigkeyring, update, query
def _keyring_or_none(self, tsigkeyring):
if not self._key_name and not self._secret:
return None
if not self._key_name or not self._secret:
raise DnsSyncError("RFC2136 TSIG requires both key name and secret")
key_name = self._key_name if self._key_name.endswith(".") else f"{self._key_name}."
try:
base64.b64decode(self._secret, validate=True)
except Exception as exc: # noqa: BLE001
raise DnsSyncError("RFC2136_TSIG_SECRET must be valid base64") from exc
if self._algorithm not in {"hmac-sha256", "hmac-sha512", "hmac-sha1", "hmac-md5.sig-alg.reg.int"}:
raise DnsSyncError(f"Unsupported TSIG algorithm: {self._algorithm}")
return tsigkeyring.from_text({key_name: self._secret})
def _zone_with_dot(self) -> str:
return self._zone if self._zone.endswith(".") else f"{self._zone}."
def _absolute_name(self, fqdn: str) -> str:
return fqdn if fqdn.endswith(".") else f"{fqdn}."
def build_dns_provider(
provider_name: str,
*,
adguard_url: str,
adguard_username: str,
adguard_password: str,
rfc2136_server: str,
rfc2136_zone: str,
rfc2136_port: int,
rfc2136_tsig_key_name: str,
rfc2136_tsig_secret: str,
rfc2136_tsig_algorithm: str,
timeout_seconds: float,
) -> DnsProvider | None:
mode = provider_name.strip().lower()
if not mode or mode == "none":
return None
if mode == "adguard":
if not adguard_url.strip():
raise DnsSyncError("ADGUARD_URL is required for DNS_PROVIDER=adguard")
if not adguard_username.strip() or not adguard_password.strip():
raise DnsSyncError("ADGUARD_USERNAME and ADGUARD_PASSWORD are required for DNS_PROVIDER=adguard")
return AdguardDnsProvider(
AdguardConfig(
url=adguard_url,
username=adguard_username,
password=adguard_password,
timeout_seconds=timeout_seconds,
)
)
if mode == "rfc2136":
return Rfc2136DnsProvider(
Rfc2136Config(
server=rfc2136_server,
zone=rfc2136_zone,
port=rfc2136_port,
timeout_seconds=timeout_seconds,
tsig_key_name=rfc2136_tsig_key_name,
tsig_secret=rfc2136_tsig_secret,
tsig_algorithm=rfc2136_tsig_algorithm,
)
)
raise DnsSyncError(f"Unsupported DNS_PROVIDER: {provider_name}")
@@ -1,6 +1,7 @@
from __future__ import annotations
from pathlib import Path
import threading
from fastapi import FastAPI, HTTPException
from fastapi.responses import FileResponse, JSONResponse
@@ -8,6 +9,7 @@ from fastapi.staticfiles import StaticFiles
from app.config import get_settings
from app.docker_api import DockerApiClient, DockerApiError, DockerUsageResolver
from app.dns_sync import DnsSyncError, build_dns_provider
from app.ip_commands import CommandError, IpAddressManager
from app.service import (
ConflictError,
@@ -25,11 +27,34 @@ def build_service() -> EntryService:
docker_client = DockerApiClient(settings.docker_api_url, timeout_seconds=settings.docker_timeout_seconds)
usage_resolver = DockerUsageResolver(docker_client)
ip_manager = IpAddressManager()
return EntryService(storage=storage, usage_resolver=usage_resolver, ip_manager=ip_manager)
dns_provider = build_dns_provider(
settings.dns_provider,
adguard_url=settings.adguard_url,
adguard_username=settings.adguard_username,
adguard_password=settings.adguard_password,
rfc2136_server=settings.rfc2136_server,
rfc2136_zone=settings.rfc2136_zone,
rfc2136_port=settings.rfc2136_port,
rfc2136_tsig_key_name=settings.rfc2136_tsig_key_name,
rfc2136_tsig_secret=settings.rfc2136_tsig_secret,
rfc2136_tsig_algorithm=settings.rfc2136_tsig_algorithm,
timeout_seconds=settings.docker_timeout_seconds,
)
return EntryService(
storage=storage,
usage_resolver=usage_resolver,
ip_manager=ip_manager,
dns_provider=dns_provider,
dns_base_domain=settings.dns_base_domain,
dns_ttl_seconds=settings.dns_ttl_seconds,
)
service = build_service()
app = FastAPI(title="Docker IP Addr Manager", version="0.1.0")
settings = get_settings()
stop_event = threading.Event()
background_thread: threading.Thread | None = None
static_dir = Path(__file__).parent / "static"
app.mount("/static", StaticFiles(directory=static_dir), name="static")
@@ -41,6 +66,39 @@ def startup_reconcile() -> None:
if errors:
for error in errors:
print(f"[startup-reconcile] {error}")
dns_errors = service.reconcile_dns_records()
if dns_errors:
for error in dns_errors:
print(f"[dns-reconcile-startup] {error}")
_start_dns_background_loop()
@app.on_event("shutdown")
def shutdown_reconcile() -> None:
stop_event.set()
if background_thread and background_thread.is_alive():
background_thread.join(timeout=2.0)
def _dns_background_worker(interval_seconds: float) -> None:
while not stop_event.wait(interval_seconds):
errors = service.reconcile_dns_records()
for error in errors:
print(f"[dns-reconcile] {error}")
def _start_dns_background_loop() -> None:
global background_thread
if settings.dns_provider.strip().lower() in {"", "none"}:
return
if background_thread and background_thread.is_alive():
return
background_thread = threading.Thread(
target=_dns_background_worker,
args=(max(settings.dns_sync_interval_seconds, 1.0),),
daemon=True,
)
background_thread.start()
@app.get("/")
@@ -139,3 +197,8 @@ def delete_entry(entry_id: str) -> dict:
@app.exception_handler(DockerApiError)
async def docker_error_handler(_, exc: DockerApiError):
return JSONResponse(status_code=503, content={"detail": str(exc)})
@app.exception_handler(DnsSyncError)
async def dns_error_handler(_, exc: DnsSyncError):
return JSONResponse(status_code=503, content={"detail": str(exc)})
@@ -46,6 +46,8 @@ class EntryView:
used: bool
containers: list[str]
usage_known: bool
dns_desired: bool = False
dns_last_error: str | None = None
def to_dict(self) -> dict:
return {
@@ -59,4 +61,6 @@ class EntryView:
"used": self.used,
"containers": self.containers,
"usage_known": self.usage_known,
"dns_desired": self.dns_desired,
"dns_last_error": self.dns_last_error,
}
@@ -7,6 +7,7 @@ from typing import Callable
from uuid import uuid4
from app.docker_api import DockerApiError, DockerUsageResolver
from app.dns_sync import DnsProvider, DnsSyncError, to_fqdn
from app.interfaces import list_host_interfaces
from app.ip_commands import CommandError, IpAddressManager
from app.models import EntryView, IpEntry
@@ -50,12 +51,19 @@ class EntryService:
usage_resolver: DockerUsageResolver,
ip_manager: IpAddressManager,
interface_provider: Callable[[], list[str]] = list_host_interfaces,
dns_provider: DnsProvider | None = None,
dns_base_domain: str = "",
dns_ttl_seconds: int = 120,
):
self._storage = storage
self._usage_resolver = usage_resolver
self._ip_manager = ip_manager
self._interface_provider = interface_provider
self._dns_provider = dns_provider
self._dns_base_domain = dns_base_domain
self._dns_ttl_seconds = dns_ttl_seconds
self._lock = threading.Lock()
self._dns_errors_by_id: dict[str, str] = {}
def list_interfaces(self) -> list[str]:
interfaces = self._interface_provider()
@@ -89,6 +97,8 @@ class EntryService:
used=used,
containers=containers,
usage_known=usage_known,
dns_desired=bool(self._dns_provider) and usage_known and used and entry.enabled,
dns_last_error=self._dns_errors_by_id.get(entry.id),
)
)
@@ -100,6 +110,7 @@ class EntryService:
entries = self._storage.list_entries()
self._assert_device_exists(parsed["device"])
self._assert_unique_binding(entries, ip=parsed["ip"], cidr=parsed["cidr"], device=parsed["device"])
self._assert_unique_name(entries, name=parsed["name"])
created = IpEntry(
id=uuid4().hex,
@@ -129,6 +140,7 @@ class EntryService:
device=parsed["device"],
ignore_entry_id=entry_id,
)
self._assert_unique_name(entries, name=parsed["name"], ignore_entry_id=entry_id)
updated = IpEntry(
id=current.id,
name=parsed["name"],
@@ -145,6 +157,7 @@ class EntryService:
with self._lock:
entries = self._storage.list_entries()
index, entry = _find_entry(entries, entry_id)
previous_enabled = entry.enabled
if enabled:
self._ip_manager.ensure_present(entry.ip, entry.cidr, entry.device)
@@ -154,6 +167,12 @@ class EntryService:
self._ip_manager.ensure_absent(entry.ip, entry.cidr, entry.device)
entry.enabled = False
try:
self._sync_dns_for_entry_locked(entry, strict=True)
except Exception: # noqa: BLE001
self._rollback_enable_change(entry, previous_enabled)
raise
entries[index] = entry
self._storage.save_entries(entries)
return entry
@@ -166,8 +185,10 @@ class EntryService:
raise ConflictError("Disable entry before deleting")
self._assert_not_used(entry)
self._delete_dns_for_entry_locked(entry, strict=True)
remaining = [candidate for candidate in entries if candidate.id != entry_id]
self._storage.save_entries(remaining)
self._dns_errors_by_id.pop(entry.id, None)
def reconcile_enabled_entries(self) -> list[str]:
errors: list[str] = []
@@ -186,6 +207,84 @@ class EntryService:
self._storage.save_entries(entries)
return errors
def reconcile_dns_records(self) -> list[str]:
if not self._dns_provider:
return []
errors: list[str] = []
with self._lock:
entries = self._storage.list_entries()
usage_map, usage_known, usage_error = self._resolve_usage(entries)
if not usage_known:
msg = f"Docker usage check failed for DNS reconcile: {usage_error or 'unknown error'}"
for entry in entries:
self._dns_errors_by_id[entry.id] = msg
return [msg]
for entry in entries:
used = bool(usage_map.get(entry.ip, set()))
desired = entry.enabled and used
try:
self._apply_dns_state_locked(entry, desired)
self._dns_errors_by_id.pop(entry.id, None)
except (DnsSyncError, DependencyError, ConflictError) as exc:
self._dns_errors_by_id[entry.id] = str(exc)
errors.append(f"{entry.name}: {exc}")
return errors
def _rollback_enable_change(self, entry: IpEntry, previous_enabled: bool) -> None:
try:
if previous_enabled:
self._ip_manager.ensure_present(entry.ip, entry.cidr, entry.device)
entry.enabled = True
else:
self._ip_manager.ensure_absent(entry.ip, entry.cidr, entry.device)
entry.enabled = False
except CommandError:
pass
def _sync_dns_for_entry_locked(self, entry: IpEntry, strict: bool) -> None:
if not self._dns_provider:
return
usage_map, usage_known, usage_error = self._resolve_usage([entry])
if not usage_known:
msg = f"Docker usage check failed: {usage_error or 'unknown error'}"
self._dns_errors_by_id[entry.id] = msg
if strict:
raise DependencyError(msg)
return
desired = entry.enabled and bool(usage_map.get(entry.ip, set()))
self._apply_dns_state_locked(entry, desired)
self._dns_errors_by_id.pop(entry.id, None)
def _delete_dns_for_entry_locked(self, entry: IpEntry, strict: bool) -> None:
if not self._dns_provider:
return
try:
fqdn = to_fqdn(entry.name, self._dns_base_domain)
self._dns_provider.delete_a_record(fqdn)
self._dns_errors_by_id.pop(entry.id, None)
except DnsSyncError as exc:
self._dns_errors_by_id[entry.id] = str(exc)
if strict:
raise ConflictError(f"DNS delete failed for {entry.name}: {exc}") from exc
def _apply_dns_state_locked(self, entry: IpEntry, desired: bool) -> None:
if not self._dns_provider:
return
try:
fqdn = to_fqdn(entry.name, self._dns_base_domain)
if desired:
self._dns_provider.upsert_a_record(fqdn, entry.ip, self._dns_ttl_seconds)
else:
self._dns_provider.delete_a_record(fqdn)
except DnsSyncError as exc:
raise ConflictError(f"DNS sync failed for {entry.name}: {exc}") from exc
def _assert_not_used(self, entry: IpEntry) -> None:
try:
usage = self._usage_resolver.resolve_ip_usage({entry.ip})
@@ -223,6 +322,14 @@ class EntryService:
if entry.ip == ip and entry.cidr == cidr and entry.device == device:
raise ConflictError("Entry with same ip/cidr/device already exists")
def _assert_unique_name(self, entries: list[IpEntry], name: str, ignore_entry_id: str | None = None) -> None:
target = name.strip().lower()
for entry in entries:
if ignore_entry_id and entry.id == ignore_entry_id:
continue
if entry.name.strip().lower() == target:
raise ConflictError("Entry name must be unique")
def _assert_device_exists(self, device: str) -> None:
interfaces = self.list_interfaces()
if device not in interfaces:
@@ -258,15 +365,15 @@ def _parse_payload(payload: dict) -> dict:
if any(ch.isspace() for ch in device):
raise ValidationError("Field 'device' cannot contain whitespace")
raw_cidr = payload.get("cidr")
if raw_cidr is None:
cidr_raw = payload.get("cidr")
if cidr_raw is None:
raise ValidationError("Field 'cidr' is required")
try:
cidr = int(raw_cidr)
cidr = int(cidr_raw)
except (TypeError, ValueError) as exc:
raise ValidationError("Field 'cidr' must be an integer") from exc
if cidr < 0 or cidr > 32:
raise ValidationError("Field 'cidr' must be in range 0..32")
raise ValidationError("Field 'cidr' must be between 0 and 32")
return {
"name": name,
@@ -1,2 +1,3 @@
fastapi==0.116.1
uvicorn==0.35.0
dnspython==2.7.0
@@ -19,6 +19,20 @@ services:
STATE_FILE: /data/entries.json
DOCKER_API_URL: unix:///var/run/docker.sock
DOCKER_TIMEOUT_SECONDS: "3"
DNS_PROVIDER: none
DNS_BASE_DOMAIN: home.arpa
DNS_TTL_SECONDS: "120"
DNS_SYNC_INTERVAL_SECONDS: "15"
ADGUARD_URL: http://127.0.0.1:3000
ADGUARD_USERNAME: ""
ADGUARD_PASSWORD: ""
ADGUARD_API_TOKEN: ""
RFC2136_SERVER: ""
RFC2136_ZONE: ""
RFC2136_PORT: "53"
RFC2136_TSIG_KEY_NAME: ""
RFC2136_TSIG_SECRET: ""
RFC2136_TSIG_ALGORITHM: hmac-sha256
volumes:
- type: bind
source: /DATA/AppData/$AppID/data
@@ -38,6 +52,33 @@ services:
- container: DOCKER_TIMEOUT_SECONDS
description:
en_us: Timeout in seconds for Docker API requests
- container: DNS_PROVIDER
description:
en_us: DNS backend (none, adguard, rfc2136)
- container: DNS_BASE_DOMAIN
description:
en_us: Base domain for generated hostnames like <name>.<domain>
- container: DNS_TTL_SECONDS
description:
en_us: TTL in seconds for DNS A records
- container: DNS_SYNC_INTERVAL_SECONDS
description:
en_us: Background DNS reconcile interval in seconds
- container: ADGUARD_URL
description:
en_us: AdGuard Home URL for DNS_PROVIDER=adguard
- container: ADGUARD_USERNAME
description:
en_us: AdGuard Home username for DNS_PROVIDER=adguard
- container: ADGUARD_PASSWORD
description:
en_us: AdGuard Home password for DNS_PROVIDER=adguard
- container: RFC2136_SERVER
description:
en_us: RFC2136 nameserver host or IP
- container: RFC2136_ZONE
description:
en_us: RFC2136 zone name (for example home.arpa)
volumes:
- container: /data
description:
@@ -91,7 +132,7 @@ x-casaos:
be validated.
Start by adding a new IP Entry in this app and connect it to the appropriate Device.
Then install, or update, a zima app and choose network: bridge.
* Enter a name for the app, will be used to create dns records in future release, add a non-used IP Address, CIDR, and choose device
* Enter a name for the app, and the app can create DNS records as <name>.<DNS_BASE_DOMAIN> when DNS sync is enabled
* Click add, and a new row appears under.
* Click "Enable" to have this app setup the host to listen to this IP Address
* To to the ZimaOS App Store, choose an app, and do a "Custom Install"
@@ -11,6 +11,7 @@ BACKEND_DIR = ROOT_DIR / "backend"
sys.path.insert(0, str(BACKEND_DIR))
from app.docker_api import DockerApiError, DockerUsageResolver
from app.dns_sync import DnsSyncError
from app.models import IpEntry
from app.service import ConflictError, DependencyError, EntryService
from app.storage import EntryStorage
@@ -75,12 +76,40 @@ class FakeIpManager:
self.present.discard((ip, cidr, device))
class FakeDnsProvider:
def __init__(self, fail_upsert=False, fail_delete=False):
self.records = {}
self.upserts = []
self.deletes = []
self.fail_upsert = fail_upsert
self.fail_delete = fail_delete
def upsert_a_record(self, fqdn: str, ip: str, ttl: int):
if self.fail_upsert:
raise DnsSyncError("upsert failed")
self.records[fqdn] = ip
self.upserts.append((fqdn, ip, ttl))
def delete_a_record(self, fqdn: str):
if self.fail_delete:
raise DnsSyncError("delete failed")
self.records.pop(fqdn, None)
self.deletes.append(fqdn)
def assert_true(condition, message):
if not condition:
raise AssertionError(message)
def build_service(tmp_path: Path, entries=None, usage_resolver=None, ip_manager=None):
def build_service(
tmp_path: Path,
entries=None,
usage_resolver=None,
ip_manager=None,
dns_provider=None,
dns_base_domain="home.arpa",
):
storage = EntryStorage(str(tmp_path / "entries.json"))
if entries:
storage.save_entries(entries)
@@ -93,6 +122,9 @@ def build_service(tmp_path: Path, entries=None, usage_resolver=None, ip_manager=
usage_resolver=resolver,
ip_manager=ipm,
interface_provider=lambda: ["eth0", "eth1"],
dns_provider=dns_provider,
dns_base_domain=dns_base_domain,
dns_ttl_seconds=120,
)
@@ -163,6 +195,79 @@ def test_reconcile_reapplies_enabled(tmp_path: Path):
assert_true(("10.0.4.10", 16, "eth0") in ip_manager.present, "enabled IP must be re-applied on startup reconcile")
def test_dns_upsert_on_enable_when_used(tmp_path: Path):
entry = IpEntry(id="dns1", name="Lan App", ip="10.0.4.20", cidr=16, device="eth0", enabled=False)
resolver = FakeUsageResolver(mapping={"10.0.4.20": {"nginx"}})
ip_manager = FakeIpManager()
dns = FakeDnsProvider()
service = build_service(tmp_path, entries=[entry], usage_resolver=resolver, ip_manager=ip_manager, dns_provider=dns)
service.set_enabled("dns1", enabled=True)
assert_true(dns.records.get("lan-app.home.arpa") == "10.0.4.20", "DNS record should be created on enable+used")
def test_dns_no_upsert_on_enable_when_unused(tmp_path: Path):
entry = IpEntry(id="dns2", name="Lan App 2", ip="10.0.4.21", cidr=16, device="eth0", enabled=False)
resolver = FakeUsageResolver(mapping={})
ip_manager = FakeIpManager()
dns = FakeDnsProvider()
service = build_service(tmp_path, entries=[entry], usage_resolver=resolver, ip_manager=ip_manager, dns_provider=dns)
service.set_enabled("dns2", enabled=True)
assert_true("lan-app-2.home.arpa" not in dns.records, "Unused entries must not create DNS record")
def test_dns_reconcile_deletes_when_no_longer_used(tmp_path: Path):
entry = IpEntry(id="dns3", name="Lan App 3", ip="10.0.4.22", cidr=16, device="eth0", enabled=True)
resolver = FakeUsageResolver(mapping={"10.0.4.22": {"nginx"}})
dns = FakeDnsProvider()
service = build_service(tmp_path, entries=[entry], usage_resolver=resolver, dns_provider=dns)
service.reconcile_dns_records()
assert_true(dns.records.get("lan-app-3.home.arpa") == "10.0.4.22", "record should exist after used reconcile")
resolver._mapping = {}
service.reconcile_dns_records()
assert_true("lan-app-3.home.arpa" not in dns.records, "record should be removed when usage disappears")
def test_dns_fail_closed_rolls_back_enable(tmp_path: Path):
entry = IpEntry(id="dns4", name="Lan App 4", ip="10.0.4.23", cidr=16, device="eth0", enabled=False)
resolver = FakeUsageResolver(mapping={"10.0.4.23": {"nginx"}})
ip_manager = FakeIpManager()
dns = FakeDnsProvider(fail_upsert=True)
service = build_service(tmp_path, entries=[entry], usage_resolver=resolver, ip_manager=ip_manager, dns_provider=dns)
failed = False
try:
service.set_enabled("dns4", enabled=True)
except ConflictError:
failed = True
assert_true(failed, "enable must fail-closed when DNS upsert fails")
current = service.list_entries().items[0]
assert_true(not current.enabled, "entry must roll back to disabled on DNS failure")
assert_true(not ip_manager.is_present("10.0.4.23", 16, "eth0"), "IP presence must roll back on DNS failure")
def test_dns_fail_closed_blocks_delete(tmp_path: Path):
entry = IpEntry(id="dns5", name="Lan App 5", ip="10.0.4.24", cidr=16, device="eth0", enabled=False)
resolver = FakeUsageResolver(mapping={})
dns = FakeDnsProvider(fail_delete=True)
service = build_service(tmp_path, entries=[entry], usage_resolver=resolver, dns_provider=dns)
failed = False
try:
service.delete_entry("dns5")
except ConflictError:
failed = True
assert_true(failed, "delete must fail-closed when DNS cleanup fails")
assert_true(len(service.list_entries().items) == 1, "entry must remain when delete fails")
def main():
test_exact_hostip_match_only()
@@ -172,6 +277,11 @@ def main():
test_disable_blocked_when_docker_check_fails(tmp_path)
test_delete_blocked_when_enabled(tmp_path)
test_reconcile_reapplies_enabled(tmp_path)
test_dns_upsert_on_enable_when_used(tmp_path)
test_dns_no_upsert_on_enable_when_unused(tmp_path)
test_dns_reconcile_deletes_when_no_longer_used(tmp_path)
test_dns_fail_closed_rolls_back_enable(tmp_path)
test_dns_fail_closed_blocks_delete(tmp_path)
print("Integration tests passed")
+143
View File
@@ -0,0 +1,143 @@
# Snacks
Automated video library encoder with hardware acceleration (NVENC, QSV, VAAPI, AMF).
## Purpose
Snacks batch-transcodes video libraries using FFmpeg with hardware acceleration.
It monitors directories, skips already-encoded files, retries with fallbacks, and supports distributed cluster encoding across multiple ZimaOS nodes.
## Port
- `6767/tcp` — Web UI at `http://localhost:6767`
## Volumes
| Host path | Container path | Description |
|---|---|---|
| `/DATA/AppData/$AppID/media` | `/app/work/uploads` | Media library — source files to encode |
| `/DATA/AppData/$AppID/logs` | `/app/work/logs` | Transcoding logs |
| `/DATA/AppData/$AppID/config` | `/app/work/config` | Settings and SQLite database |
## Hardware Acceleration
Snacks uses GPU encoding via `/dev/dri`:
| Driver | Codecs | Devices |
|---|---|---|
| VAAPI (Linux) | H.265, H.264 | Intel iHD/i965, AMD VAAPI |
| QSV (Intel) | H.265, H.264 | Intel Quick Sync Video |
| NVENC (NVIDIA) | H.265, H.264 | NVIDIA GPUs via CUDA |
| AMF (AMD) | H.265, H.264 | AMD GPUs |
Auto-detection runs on first encode and picks the best available encoder.
## Cluster Mode
Snacks supports distributed encoding across multiple ZimaOS nodes.
- Nodes discover each other via UDP broadcast on the LAN
- One instance acts as coordinator; others are workers
- Jobs are assigned automatically; failed nodes are re-assigned
- A shared secret authenticates intra-cluster communication
**UDP broadcast requirement**: Cluster mode requires `network_mode: host` — bridge mode blocks LAN broadcast discovery, making nodes invisible to each other.
## Health Check
`http://localhost:6767/Home/Health` — returns HTTP 200 when the backend is ready.
## Privilegier och säkerhet
Aktiva säkerhetsinställningar i denna app:
- `security_opt: ["no-new-privileges:true"]`
- `cap_drop: ["ALL"]`
- `privileged: true`
- `network_mode: host`
- Device mount: `/dev/dri:/dev/dri`
Motivering:
- `no-new-privileges:true` och `cap_drop: ["ALL"]` kompenserar med lägsta möjliga capability-yta.
- Isolerad data-path under `/DATA/AppData/$AppID/...`.
## Säkerhetsavvikelser
### 1. `network_mode: host`
**Varför det behövs:**
- Snacks cluster nodes discover each other via UDP broadcast on the local network.
- Bridge mode only forwards unicast traffic; broadcast packets never reach other nodes.
- Without host networking, cluster mode is non-functional.
**Alternativ som utvärderats:**
- Bridge mode with port exposure: broadcasts are not forwarded by the Docker bridge.
- Static IP configuration: requires manual node addressing and is error-prone.
- Multicast DNS (mDNS): not supported by Docker bridge in all deployments.
**Risker:**
- Container has full access to all host ports.
- No network isolation between Snacks and other services on the host.
- If the container is compromised, the attacker has host network access.
**Riskreducering:**
- `cap_drop: ["ALL"]` minimizes syscall surface.
- `no-new-privileges:true` prevents privilege escalation.
- No sensitive host directories are mounted beyond the app-specific volumes.
---
### 2. `privileged: true`
**Varför det behävs:**
- `/dev/dri` (Direct Rendering Infrastructure) is required for VAAPI/QSV hardware acceleration.
- On standard Linux, this device is accessible without privileged mode if the user is in the `video` or `render` group.
- ZimaOS does not reliably provide these groups in the container runtime context, making `privileged: true` the only reliable way to grant device access.
**Alternativ som utvärderats:**
- `security_opt: ["apparmor:..."]` with specific `/dev/dri` access: not reliably portable across ZimaOS kernel configurations.
- Pre-create device nodes with specific permissions: does not work dynamically when the device appears.
- Skip hardware acceleration (software encoding only): defeats the primary purpose of the app.
**Risker:**
- Container has full root capabilities on the host.
- If container is compromised, attacker has theoretical access to all host resources.
- Hardware acceleration devices can be accessed directly.
**Riskreducering:**
- `cap_drop: ["ALL"]` drops all capabilities even when privileged.
- Only the specific `/dev/dri` device is mounted; no other host devices.
- Data volumes are scoped to `/DATA/AppData/$AppID/...`.
---
### 3. Device mount: `/dev/dri:/dev/dri`
**Varför det behövs:**
- VAAPI and QSV hardware encoding require direct access to the GPU render nodes in `/dev/dri`.
- Without this mount, FFmpeg falls back to software encoding which is 1050x slower on 4K content.
**Alternativ som utvärderats:**
- Specific device nodes (e.g., `/dev/dri/renderD128`): device names can vary by driver version and host kernel.
- No hardware acceleration: software fallback is too slow for practical use.
**Risker:**
- The container can enumerate and use all graphics devices on the host.
- On multi-user systems, other users' GPU resources may be accessible.
**Riskreducering:**
- `privileged: true` combined with `cap_drop: ["ALL"]` ensures the container cannot load additional kernel modules or escalate privileges.
- Only the render nodes are exposed; no other host devices are passed through.
+103
View File
@@ -0,0 +1,103 @@
name: snacks
services:
snacks:
image: derekshreds/snacks-docker:2.3.1
container_name: snacks
restart: unless-stopped
deploy:
resources:
reservations:
memory: 1G
environment:
- TZ=Europe/Stockholm
- PUID=1000
- PGID=1000
- ASPNETCORE_ENVIRONMENT=Production
- SNACKS_WORK_DIR=/app/work
- FFMPEG_PATH=/usr/lib/jellyfin-ffmpeg/ffmpeg
- FFPROBE_PATH=/usr/lib/jellyfin-ffmpeg/ffprobe
network_mode: host
volumes:
- type: bind
source: /DATA/AppData/$AppID/media
target: /app/work/uploads
- type: bind
source: /DATA/AppData/$AppID/logs
target: /app/work/logs
- type: bind
source: /DATA/AppData/$AppID/config
target: /app/work/config
devices:
- /dev/dri:/dev/dri
privileged: true
security_opt:
- no-new-privileges:true
cap_drop:
- ALL
healthcheck:
test: ["CMD", "curl", "-f", "http://localhost:6767/Home/Health"]
interval: 30s
timeout: 10s
retries: 3
start_period: 40s
x-casaos:
envs:
- container: TZ
description:
en_US: Timezone, for example Europe/Stockholm
- container: PUID
description:
en_US: User ID for filesystem permissions
- container: PGID
description:
en_US: Group ID for filesystem permissions
- container: FFMPEG_PATH
description:
en_US: "FFmpeg binary path (default: /usr/lib/jellyfin-ffmpeg/ffmpeg). Use /usr/bin/ffmpeg on systems without jellyfin-ffmpeg."
- container: FFPROBE_PATH
description:
en_US: "FFprobe binary path (default: /usr/lib/jellyfin-ffmpeg/ffprobe). Use /usr/bin/ffprobe on systems without jellyfin-ffmpeg."
ports:
- container: "6767"
description:
en_US: Web UI port
volumes:
- container: /app/work/uploads
description:
en_US: Media library — source files to be encoded
- container: /app/work/logs
description:
en_US: Transcoding logs directory
- container: /app/work/config
description:
en_US: Application configuration and SQLite database
x-casaos:
architectures:
- amd64
main: snacks
category: phirna
author: Joachim Friberg
developer: Joachim Friberg
icon: https://cdn.simpleicons.org/snacks
tagline:
en_US: Automated video library encoder with hardware acceleration
description:
en_US: >-
Batch transcode your video library with hardware acceleration (NVENC, QSV, VAAPI, AMF).
Monitors directories, skips already-encoded files, and supports distributed cluster encoding.
Web UI at http://localhost:6767
title:
en_US: Snacks
index: /
port_map: "6767"
+15
View File
@@ -0,0 +1,15 @@
## Backlog
| # | Done | Name | Source | What | Agent instructions |
|---|---|---|---|---|---|
| 1 | [x] | Snacks | https://github.com/derekshreds/snacks | Automated video library encoder | Branch `snacks/initial/add-video-encoder`; implemented in `Apps/snacks/` |
## Adding a new app
1. Copy `Apps/_template/``Apps/<app-id>/`
2. Set `name` in compose (lowercase + hyphen only)
3. Pin image to explicit version/tag (no `:latest`); verify tag exists in registry
4. Add `x-casaos` metadata (title, description, icon, category, author, port_map)
5. Write `README.md` with purpose, ports, volumes, envs, and risk justifications
6. Validate: `./scripts/validate-appstore.sh`
7. Run final validation before release: `./scripts/validate-appstore.sh --enforce-risk-docs`
BIN
View File
Binary file not shown.