GKGulshan Kumar
HomeBlogProduction Monitoring with Prometheus &
Observability9 min read

Production Monitoring with Prometheus & Grafana — Complete Setup

GK
Gulshan Kumar
5 December 2024

The Observability Stack


Good observability answers three questions instantly:

1. Is it broken? (Alerting)

2. Where is it broken? (Metrics + Tracing)

3. Why did it break? (Logs)


Prometheus Setup



# prometheus.yml
global:
  scrape_interval: 15s

scrape_configs:
  - job_name: 'nodejs-app'
    static_configs:
      - targets: ['app:3000']
    metrics_path: '/metrics'

Key Alerts I Use



groups:
  - name: production
    rules:
      - alert: HighErrorRate
        expr: rate(http_requests_total{status=~"5.."}[5m]) > 0.05
        for: 2m
        labels:
          severity: critical
        annotations:
          summary: "Error rate > 5% for 2 minutes"

Grafana Dashboard Tips


  • Use variables for service name — one dashboard covers all services
  • Add SLO panels (target 99.9% uptime = 8.7h downtime/year budget)
  • Set Slack/PagerDuty alert routing for severity tiers

  • Result


    End-to-end visibility across 15+ services. MTTR dropped from 45 minutes to under 10 minutes.

    ← Back to Blog✉️ Discuss this post