Scenario 1

Helm timed out before Grafana was ready.

The valuable move here was to distinguish a genuinely broken install from a local first-run startup delay inside the observability stack.

Symptomcontext deadline exceeded

HealthyPrometheus stack

LaggingGrafana init

OutcomeWait, don’t rewrite

Symptom

Helm reported a failed install

The shared observability installation returned context deadline exceeded. At first glance that looks like a broken configuration, but the pod view showed a more specific story: Prometheus, Alertmanager, the operator, and node-exporter were already healthy while Grafana was still initializing.

STATUS: pending-upgrade
Error: context deadline exceeded

kubectl get pods -n monitoring
kube-prometheus-stack-grafana-...   0/3   PodInitializing

Diagnosis

This was timing, not misconfiguration

On a first local run, Grafana can still be pulling and starting while Helm waits on the full stack. The right response was to verify component-level health instead of assuming the whole release was invalid.

Signal Helm stopped first

The wrapper timed out before every container became healthy.

Reading Grafana was still converging

The pod state showed initialization, not a terminal failure.

Decision Wait, then continue

Treat the event as rollout lag instead of rewriting the install path.

Evidence and recovery

These checks turn a generic Helm error into a specific readiness interpretation.

Validation commands

kubectl get pods -n monitoring -w
kubectl get events -n monitoring
kubectl describe pod -n monitoring <grafana-pod-name>

Correct decision

# Wait until Grafana reaches:
3/3 Running

# Then continue with the app path:
./application/payment-exception-review-service/create_local_app_with_helm.sh -s