Scenario 4
Local Grafana persistence caused a repeatable permission failure.
This was the strongest troubleshooting story because it moved from diagnosis into a durable repository design change.
SymptomInit:CrashLoopBackOff
Containerinit-chown-data
Root causePVC permissions
Permanent fixLocal no persistence
What failed
Grafana was the only unhealthy component
Prometheus, Alertmanager, and the operator were healthy, but Grafana stayed stuck in Init:CrashLoopBackOff. The init container logs showed permission errors under /var/lib/grafana.
chown: /var/lib/grafana/pdf: Permission denied
chown: /var/lib/grafana/png: Permission denied
chown: /var/lib/grafana/csv: Permission denied
Why this mattered
The fix changed the local platform design
This was not treated as a random reinstall ritual. The repo now treats local Grafana persistence as a disposable-environment concern: local installs disable persistence, while dev / AKS keeps persistence enabled.
Signal
Only Grafana was unhealthy
The rest of the observability stack reached a stable state.
Reading
The PVC ownership was the friction
The init container failed on local volume permissions, not on Helm config.
Decision
Disable local persistence
Keep the local platform fast and disposable while retaining persistence in AKS.
Diagnosis checklist
kubectl -n monitoring get pods
kubectl -n monitoring logs <grafana-pod> -c init-chown-data
kubectl -n monitoring get pvcRecovery path
helm uninstall kube-prometheus-stack -n monitoring
kubectl -n monitoring delete pvc kube-prometheus-stack-grafana --ignore-not-found
./platform/kubernetes-resources/observability/install_local_observability_stack.sh