Troubleshooting / Scenario 4
Scenario 4

Local Grafana persistence caused a repeatable permission failure.

This was the strongest troubleshooting story because it moved from diagnosis into a durable repository design change.

SymptomInit:CrashLoopBackOff
Containerinit-chown-data
Root causePVC permissions
Permanent fixLocal no persistence
What failed

Grafana was the only unhealthy component

Prometheus, Alertmanager, and the operator were healthy, but Grafana stayed stuck in Init:CrashLoopBackOff. The init container logs showed permission errors under /var/lib/grafana.

chown: /var/lib/grafana/pdf: Permission denied
chown: /var/lib/grafana/png: Permission denied
chown: /var/lib/grafana/csv: Permission denied
Why this mattered

The fix changed the local platform design

This was not treated as a random reinstall ritual. The repo now treats local Grafana persistence as a disposable-environment concern: local installs disable persistence, while dev / AKS keeps persistence enabled.

Signal Only Grafana was unhealthy

The rest of the observability stack reached a stable state.

Reading The PVC ownership was the friction

The init container failed on local volume permissions, not on Helm config.

Decision Disable local persistence

Keep the local platform fast and disposable while retaining persistence in AKS.

Diagnosis checklist

kubectl -n monitoring get pods
kubectl -n monitoring logs <grafana-pod> -c init-chown-data
kubectl -n monitoring get pvc

Recovery path

helm uninstall kube-prometheus-stack -n monitoring
kubectl -n monitoring delete pvc kube-prometheus-stack-grafana --ignore-not-found
./platform/kubernetes-resources/observability/install_local_observability_stack.sh