zolty-matZoltyMatzolty-mat/home_k3s_clusterk3s-homelabAWS_ACCESS_KEY_ID, AWS_SECRET_ACCESS_KEY, AWS_ACCOUNT_ID (855878721457), GH_PAT| Component | Version | Notes |
|---|---|---|
| Proxmox VE | 8.x | Hypervisor |
| Terraform | 1.13+ | bpg/proxmox provider v0.98.1 (~> 0.50) |
| k3s | v1.34.5+k3s1 | HA embedded etcd |
| Debian | 13 (Trixie) | All VMs |
| MetalLB | v0.14.3 | L2 mode |
| Longhorn | v1.8.2 | 2 replicas, auto-balance best-effort; banned on server nodes; disabled on k3s-agent-4 |
| cert-manager | v1.14.2 | DNS-01 via Route53 |
| ARC | Runner scale sets | k3s-runner-v2, migrated 2026-04-04 |
| Authentik | v2026.2.1 | SSO, replaced OAuth2 Proxy 2026-04-04 |
| Prometheus + Grafana | kube-prometheus-stack | Helm |
| Loki + Promtail | grafana/loki-stack | Helm |
| Velero | latest | Kubernetes object backup (daily + weekly), S3 backend |
Detailed specs, teardown photos, and maintenance notes: Hardware Reference
| Segment | Range | Purpose |
|---|---|---|
| Gateway/DNS | 192.168.1.1 | UniFi router, DNS |
| USW Aggregation | 192.168.1.96 | Ubiquiti USW Aggregation (8x SFP+), 10GbE backbone |
| Proxmox hosts | 192.168.1.105-108 | pve1, pve2, pve3, pve4 |
| K3s Servers | 192.168.20.20-22 | Control plane VMs |
| K3s Agents | 192.168.20.30-33 | Worker VMs (amd64) |
| NAS (Ugreen DXP4800) | 192.168.30.10 | Storage VLAN 30 -- NFS, 802.3ad LACP LAG (5Gbps) |
| MetalLB Pool | 192.168.20.200-220 | LoadBalancer IPs |
| Traefik LB | 192.168.20.200 | Primary ingress (internal + public) |
| Dev Workspace SSH (Mat) | 192.168.20.201 | ARCHIVED — SSH to code-server-mat (namespace not deployed) |
| Home Assistant LB | 192.168.20.202 | HA LoadBalancer |
| Mosquitto LB | 192.168.20.203 | MQTT for Digital Signage |
| Dev Workspace SSH (Aja) | 192.168.20.204 | ARCHIVED — SSH to code-server-aja (namespace not deployed) |
| Seedbox (RapidSeedbox) | 45.128.27.65 | External -- SFTP :2222, rclone syncs content to NAS |
| Kasa HS300 | 192.168.1.205 | Smart power strip controlling pve1(0), pve2(1), pve3(2), pve4(5) — Proxmox Watchdog target |
All three Proxmox hosts (pve1/pve2/pve3) have Mellanox ConnectX-3 EN 10GbE NICs connected to USW Aggregation via Cable Matters 1m SFP+ DAC cables. Each host uses an active-backup bond (bond0) with the 10GbE NIC as primary and onboard 1GbE Intel I219-LM as failover.
| Node | Role | Arch | IP | OS | Resources |
|---|---|---|---|---|---|
| k3s-server-1 | server | amd64 | 192.168.20.20 | Debian 13 | 2c / 8GB / 80GB |
| k3s-server-2 | server | amd64 | 192.168.20.21 | Debian 13 | 2c / 8GB / 80GB |
| k3s-server-3 | server | amd64 | 192.168.20.22 | Debian 13 | 2c / 8GB / 80GB |
| k3s-agent-1 | worker | amd64 | 192.168.20.30 | Debian 13 | 6c / 14GB / 300GB |
| k3s-agent-2 | worker | amd64 | 192.168.20.31 | Debian 13 | 6c / 22GB / 300GB |
| k3s-agent-3 | worker | amd64 | 192.168.20.32 | Debian 13 | 6c / 22GB / 300GB |
| k3s-agent-4 | worker (GPU) | amd64 | 192.168.20.33 | Debian 13 | 12c / 28GB / 450GB |
Server nodes were increased from 6GB → 8GB after etcd + k3s daemonsets saturated 6GB RAM. pve1 has 24GB physical RAM (limited agent headroom); pve2/3/4 have 32GB.
| Node | Boot Disk | Longhorn Capacity | Notes |
|---|---|---|---|
| k3s-server-1/2/3 | 80GB | — | Replica scheduling permanently banned — etcd + daemonsets only |
| k3s-agent-1/2/3 | 300GB | ~295 GiB each | /var/lib/longhorn/ on boot disk, active |
| k3s-agent-4 | 450GB | disabled | Aging NVMe on pve4 — excluded from Longhorn |
| Total usable | ~885 GiB raw (~442 GiB w/ 2 replicas) |
Total reported by Longhorn dashboard is ~2.5 TiB (includes all disk capacity, not just replicated). Effective replicated capacity is ~442 GiB.
best-effortnode.longhorn.io/create-default-disk: false annotation + storage scheduled = falseOpenClaw dual internal URLs:
chat.k3s.internal.strommen.systemsis the Open WebUI Helm chart (chat interface).openclaw.k3s.internal.strommen.systemsis the custom OpenClaw gateway service. Both live in theopen-webuinamespace but are distinct services. The public routechat.k3s.strommen.systemsresolves to the Open WebUI.
use_lockfile = true (NOT dynamodb_table -- deprecated in 1.13+)k3s-homelab-tfstate-855878721457 (us-east-1, versioned, encrypted)aws/terraform.tfstate, homelab-prod/terraform.tfstatek3s-homelab-ci (path: /system/)proxmox_nic_fix Ansible role disables hardware offloading via ethtoolpve{1,2,3}.strommen.systemsletsencrypt-staging and letsencrypt-prod (both Ready)acme-v02 not acme-v2Z1LLBOMFGEFI6S for strommen.systems*.k3s.internal.strommen.systems -> 192.168.20.200 (internal)*.k3s.strommen.systems -> public IP (DDNS-managed, Route53)ansible/.env (not committed). Source before running Ansible.group_vars/all.ymlcontroller-manager in arc-runner-system (key: github_token)TF_VAR_proxmox_password / PVE_PASSharbor-pull-secret in each app namespace (static robot token, no expiry)./scripts/bootstrap.sh # Full cluster deployment
./scripts/status.sh # Check cluster status
./scripts/upgrade-k3s.sh v1.34.5+k3s1 # Upgrade k3s
./scripts/recreate-node.sh k3s-agent-1 # Recreate a failed node
cd terraform/environments/homelab-prod && terraform plan && terraform apply
cd ansible && ansible-playbook -i inventory/homelab playbooks/site.yml
ssh -F ssh_config k3s-server-1 # SSH to cluster nodes
| Constraint | Detail |
|---|---|
| Longhorn server scheduling | Permanently banned — etcd + daemonsets saturate server node memory |
| Longhorn agent-4 | Disabled — aging NVMe on pve4 |
| Container builds | --platform linux/amd64 --provenance=false required for all builds |
| Deployment strategy | Recreate (not RollingUpdate) for all Longhorn RWO PVC workloads |
| Service selector | MUST include app.kubernetes.io/component: web (avoids postgres routing 502s) |
| k3s agent upgrade | Wipes env — restore K3S_TOKEN + K3S_URL after every agent upgrade |
| kubectl timeout | Always --timeout=300s minimum for rollout status |
| Prometheus/Grafana storage | Use nfs-monitoring StorageClass — never Longhorn |
| UFW + new nodes | ufw allow from <ip> on ALL existing nodes before adding new node |
| LACP switch order | Change NAS to 802.3ad FIRST, then enable switch aggregation — reversing crashes UGOS |