FigJam Diagram: K3s Homelab — Cluster Overview (expires 2026-04-13)
| Device | Role | Specs | IP |
|---|---|---|---|
| ThinkCentre M920q (pve1) | Proxmox host | i5-8500T 6c/6t, 24GB RAM, 512GB NVMe | 192.168.1.105 (Default VLAN) |
| ThinkCentre M920q (pve2) | Proxmox host | i5-8500T 6c/6t, 32GB RAM, 512GB NVMe | 192.168.1.106 (Default VLAN) |
| ThinkCentre M920q (pve3) | Proxmox host | i5-8500T 6c/6t, 32GB RAM, 512GB NVMe | 192.168.1.107 (Default VLAN) |
| ThinkCentre M920q (pve4) | Proxmox host + GPU worker | i7-8700T 6c/12t, 32GB RAM, 512GB NVMe | 192.168.1.108 (Default VLAN) |
| Kasa HS300 | Smart power strip | 6 outlets, power monitoring — controls pve1(0), pve2(1), pve3(2), pve4(5) | 192.168.1.205 |
| Node | Role | Arch | IP | OS | CPU | RAM | Boot Disk | Longhorn |
|---|---|---|---|---|---|---|---|---|
| k3s-server-1 | Control Plane | amd64 | 192.168.20.20 | Debian 13 | 2 | 8 GB | 80 GB | ~79 GiB (replica scheduling banned) |
| k3s-server-2 | Control Plane | amd64 | 192.168.20.21 | Debian 13 | 2 | 8 GB | 80 GB | ~79 GiB (replica scheduling banned) |
| k3s-server-3 | Control Plane | amd64 | 192.168.20.22 | Debian 13 | 2 | 8 GB | 80 GB | ~79 GiB (replica scheduling banned) |
| k3s-agent-1 | Worker | amd64 | 192.168.20.30 | Debian 13 | 6 | 14 GB | 300 GB | ~295 GiB |
| k3s-agent-2 | Worker | amd64 | 192.168.20.31 | Debian 13 | 6 | 22 GB | 300 GB | ~295 GiB |
| k3s-agent-3 | Worker | amd64 | 192.168.20.32 | Debian 13 | 6 | 22 GB | 300 GB | ~295 GiB |
| k3s-agent-4 | Worker (GPU) | amd64 | 192.168.20.33 | Debian 13 | 12 | 28 GB | 450 GB | disabled |
Effective Longhorn capacity: ~885 GiB raw across agents 1–3 (~442 GiB usable with 2 replicas). Server node disks are registered but replica scheduling is permanently banned to preserve control plane stability.
Each ThinkCentre M920q (pve1/2/3) has an i5-8500T (6c/6t) and 512GB NVMe. pve1 has 24GB RAM, pve2/pve3 have 32GB. Per host: server VM (2c/8GB) + agent VM (6c/14-22GB). Server nodes were increased from 6GB → 8GB after etcd + daemonsets saturated 6GB RAM. pve4 has an i7-8700T (6c/12t) with 32GB — one VM worker (k3s-agent-4: 12c/28GB) with Intel UHD 630 VFIO passthrough. Longhorn is disabled on k3s-agent-4 (aging NVMe on pve4).
| Segment | Range | Purpose |
|---|---|---|
| Hypervisors | 192.168.1.105-108 | Proxmox hosts (Default VLAN; 192.168.20.x sub-interfaces) |
| Control Plane | 192.168.20.20-22 | k3s server VMs |
| Workers (VM) | 192.168.20.30-33 | k3s agent VMs (pve1/2/3 + pve4) |
| NAS (Ugreen DXP4800) | 192.168.30.10 | Storage VLAN 30 -- NFS /volume1/media + /volume1/proxmox, 802.3ad LACP LAG (5Gbps) |
| MetalLB Pool | 192.168.20.200-220 | LoadBalancer virtual IPs |
| Kasa Power Strip | 192.168.1.205 | Smart power control for pve1–4 — Proxmox Watchdog target |
Gateway / DNS: 192.168.1.1 (UniFi Gateway) + 1.1.1.1 (Cloudflare)
WireGuard: UDP 51821 is port-forwarded by UDM Pro directly to k3s-server-1 (192.168.20.20:51821). WireGuard runs as a host-level systemd service (wg1 interface) on k3s-server-1 — it does not go through MetalLB or Traefik. The mesh-peers namespace contains peer RBAC, not the WireGuard process itself.
| Namespace | Application | Key Resources |
|---|---|---|
open-webui |
OpenClaw Chat Gateway | Open-WebUI + LiteLLM proxy + Bedrock/OpenRouter |
openclaw-ops |
OpenClaw Ops Agent | FastAPI event ingest + PostgreSQL + pgvector |
openclaw-personal |
OpenClaw Personal Agent | FastAPI + job/resume/interview tools + PostgreSQL |
alert-responder |
AI Alert Analysis | Flask + Bedrock Nova Micro + Slack Socket Mode |
auto-brand |
AI Video Factory | 7 Deployments + NATS + PostgreSQL + Redis |
polymarket-lab |
Prediction Market Research | TimescaleDB + 6 scaffolded services (replicas:0) + RAG bridge; uses GHCR |
cluster-health-monitor |
Auto-Remediation | Hourly CronJob; remediates CrashLoopBackOff, PVC expansion, cert renewal |
rag |
Vector DB Platform | Qdrant StatefulSet + rag-ingester (every 10 min) + repo-ingester |
cardboard |
TCG Price Tracker | Deployment + PostgreSQL + CronJob (scraper) |
dev-workspace |
Remote Dev | ARCHIVED — manifests in _archived/dev-workspace/, namespace not deployed |
digital-signage |
Kiosk Displays | 8 Deployments + MQTT + PostgreSQL |
dnd |
D&D Multiplayer Platform | FastAPI backend + frontend + Discord bot + LiveKit voice + PostgreSQL + pgvector + Redis |
email-gateway |
SMTP Relay | Postfix → AWS SES relay; ClusterIP :587; used by AlertManager |
ham |
Habit Tracker | React SPA + Fastify API + PostgreSQL + Anthropic AI coach |
home-assistant |
Smart Home | StatefulSet (hostNetwork) + LoadBalancer |
jupyter |
Notebook Server | Harbor-deployed Jupyter; 5Gi Longhorn PVC; Prometheus/Loki cluster access |
kube-utils |
Security Honeypot | metrics-collector mimics node-exporter (:9100) on LoadBalancer IP; logs unauthorized probes; ServiceMonitor + InternalServiceProbed alert |
media |
Media Stack | Jellyfin HA + Jellyseerr + arr suite + qBittorrent |
trade-bot |
TXXD Trading Bot | Deployment + PostgreSQL |
wiki |
Wiki.js | Deployment + PostgreSQL |
proxmox-watchdog |
Proxmox Host Monitor | Deployment (hostNetwork) + Kasa HS300 KLAP; auto power-cycles pve1–4 |
authentik |
SSO / Identity Provider | Helm chart + PostgreSQL (no Redis since 2025.10) |
public-ingress |
Authentik Middleware | Traefik forwardAuth middleware definitions |
mesh-peers |
WireGuard Peer RBAC | Per-peer namespaces (bryce, jake, steve) + collective-deployer RBAC — WireGuard process is systemd on k3s-server-1 |
wireguard-exporter |
WireGuard Metrics | Prometheus exporter for mesh peer stats (pinned to k3s-server-1, hostNetwork) |
harbor |
Container Registry | Helm chart -- staging + production projects |
gitea |
Package Registry | Helm chart -- PyPI/npm/generic only (SQLite backend) |
gha-dashboard |
GitHub Actions Monitor | Python app + SQLite + CronJob |
aws-lens |
AWS Cost Viewer | Deployment + IAM credentials |
aja-recipes |
Personal Recipe App | Deployment + PostgreSQL 16-alpine (10Gi PVC), HPA (2–10 replicas) |
media-profiler |
Media Profile Generator | FastAPI + PostgreSQL, open Gmail OAuth2, public-facing |
security-scanner |
Web Vulnerability Scanner | Multi-module security scanner; no static manifest (CI-deployed from separate repo) |
velero |
Cluster Object Backup | Daily 2AM (30d TTL) + weekly Sundays 3AM (~360d TTL) + monthly 1st 4AM (365d TTL); k8s objects only, not PVC data |
monitoring |
Observability | Prometheus + Grafana + AlertManager + Loki + exporters |
arc-runner-system |
GitHub Runners + Actions Cache | ARC runner scale sets (k3s-runner-v2), 8 replicas; actions-cache (falcondev-oss/github-actions-cache-server, 50Gi NFS PVC) |
cert-manager |
TLS Automation | Let's Encrypt DNS-01 via Route53 |
kube-system |
Core k3s | Traefik, CoreDNS, Metrics Server |
longhorn-system |
Storage | Longhorn engine + replicas + UI |
metallb-system |
Load Balancing | MetalLB L2 speaker + controller |
Note:
hmb(Flutter CRM) andopendonor(Django CRM) are developed in external repositories and are not deployed to this cluster. They have no Kubernetes manifests in this repo.
| Component | Version |
|---|---|
| k3s | v1.34.5+k3s1 |
| Longhorn | v1.8.2 |
| MetalLB | v0.14.3 |
| cert-manager | v1.14.2 |
| Traefik | Bundled with k3s |
| Authentik | v2026.2.1 (Helm) |
| ARC | Runner scale sets (k3s-runner-v2) |
| Velero | Deployed (daily + weekly + monthly schedules) |
| PostgreSQL | 16-alpine |
| Prometheus (kube-prometheus-stack) | Helm |
| Loki | Helm |
*.k3s.internal.strommen.systems -> 192.168.20.200 (Traefik)*.k3s.strommen.systems -> public IP (DDNS via CronJob)Z1LLBOMFGEFI6S for strommen.systems)letsencrypt-staging and letsencrypt-prodauth.k3s.strommen.systems -- forwardAuth for all public routespve4 runs a single VM worker (k3s-agent-4) with Intel UHD 630 VFIO passthrough active. Longhorn is disabled on this node (aging NVMe). GPU workloads scheduled with gpu=intel-uhd-630 node label.
GPU consumers: Jellyfin (real-time QSV transcode), Plex (real-time QSV transcode), Tdarr (batch re-encoding, DaemonSet).
Note: A previous LXC experiment (k3s-agent-5/6) on pve4 was abandoned in March 2026 due to DNS failures and networking instability. Do NOT use LXC on pve4 again.