FigJam Diagram: Log Aggregation — CloudTrail and Anthropic to Loki (expires 2026-04-13)
Two hourly CronJobs in the monitoring namespace pull external data sources into Loki for long-term queryability alongside Prometheus metrics.
Manifests: kubernetes/apps/log-aggregation/
| Field | Value |
|---|---|
| Namespace | monitoring |
| Schedule | 15 * * * * (15 min past every hour) |
| Image | python:3.12-slim |
| Lookback window | 2 hours |
| Concurrency | Forbid |
| Timeout | 5 minutes |
| Secret | cloudtrail-reader-credentials |
| Loki stream | job=cloudtrail, source=aws-cloudtrail |
What it does:
GetCallerIdentity to resolve the AWS account IDAWSLogs/<account>/CloudTrail/<region>/<date>/.json.gz CloudTrail log files modified within the lookback windowEvent filter logic:
| Category | Examples | Action |
|---|---|---|
| Auth failures | AccessDenied, UnauthorizedAccess |
Always include |
| Auth events | ConsoleLogin, SwitchRole, AssumeRole, GetCallerIdentity |
Always include |
| Mutating ops | Create*, Delete*, Terminate*, Run*, Attach*, Modify* |
Include |
| Read-only noise | Describe*, List*, Get*, Head* |
Skip |
Loki log entry fields:
{
"event": "CreateUser",
"source": "iam.amazonaws.com",
"user": "arn:aws:iam::123456789012:user/example",
"region": "us-east-1",
"error": "",
"sourceIP": "1.2.3.4",
"userAgent": "AWS Console",
"resources": ["arn:aws:iam::..."]
}
| Field | Value |
|---|---|
| Namespace | monitoring |
| Schedule | 30 * * * * (30 min past every hour) |
| Image | python:3.12-slim |
| Lookback window | 2 hours (usage) / 2 days (cost — catches late-arriving data) |
| Concurrency | Forbid |
| Timeout | 2 minutes |
| Secret | anthropic-admin-api-key |
| Loki stream | job=anthropic-usage, source=anthropic-admin-api |
What it does:
POST /v1/organizations/usage_report/messages — hourly buckets grouped by modelPOST /v1/organizations/cost_report — daily buckets grouped by descriptionWhy Loki (not just Prometheus): Prometheus has 14-day retention by default. Loki provides long-term queryable log history for cost trends and model usage analysis beyond that window.
Usage log entry fields:
{
"type": "usage",
"model": "claude-sonnet-4-6",
"input_tokens": 12500,
"output_tokens": 3200,
"cache_read_tokens": 8000,
"cache_create_tokens": 0
}
Cost log entry fields:
{
"type": "cost",
"description": "Claude Sonnet API",
"model": "claude-sonnet-4-6",
"cost_usd": 0.0342
}
Both secrets must be created out-of-band (never committed to git).
Created from Terraform outputs after terraform apply:
kubectl create secret generic cloudtrail-reader-credentials -n monitoring \
--from-literal=AWS_ACCESS_KEY_ID="<from terraform output cloudtrail_reader_access_key_id>" \
--from-literal=AWS_SECRET_ACCESS_KEY="<from terraform output cloudtrail_reader_secret_access_key>" \
--from-literal=S3_BUCKET="<from terraform output cloudtrail_s3_bucket>"
Reused by the Anthropic cost Prometheus exporter (same secret, already exists in monitoring namespace if the exporter is deployed):
kubectl create secret generic anthropic-admin-api-key -n monitoring \
--from-literal=ANTHROPIC_ADMIN_API_KEY="<admin API key from Anthropic console>"
All CloudTrail security events:
{job="cloudtrail"} | json | line_format "{{.event}} by {{.user}} from {{.sourceIP}}"
Auth failures only:
{job="cloudtrail"} | json | error != ""
Anthropic token usage by model (last 24h):
{job="anthropic-usage"} | json | type="usage"
| unwrap input_tokens | sum by (model) [24h]
Daily Anthropic cost:
{job="anthropic-usage"} | json | type="cost"
| unwrap cost_usd | sum [24h]
| CronJob | CPU request | CPU limit | Mem request | Mem limit |
|---|---|---|---|---|
| cloudtrail-to-loki | 50m | 200m | 128Mi | 256Mi |
| anthropic-usage-to-loki | 10m | 100m | 64Mi | 128Mi |
anthropic-usage-to-lokiRefinement note: Verify
cloudtrail-reader-credentialssecret exists in themonitoringnamespace (kubectl get secret cloudtrail-reader-credentials -n monitoring) before assuming CloudTrail sync is active. The Terraform module provisioning the IAM user and S3 bucket must be applied first.