Flask: Monitoring with Prometheus
Monitoring is crucial for understanding the performance, health, and behavior of Flask applications in production. Prometheus, an open-source monitoring and alerting toolkit, integrates seamlessly with Flask to collect metrics like request latency, error rates, and resource usage. This tutorial explores monitoring Flask applications with Prometheus, covering setup, metric collection, visualization with Grafana, and best practices for effective observability.
01. Why Monitor Flask with Prometheus?
Prometheus enables real-time monitoring and alerting, helping developers detect issues, optimize performance, and ensure reliability. Flask, being a lightweight framework, benefits from Prometheus’s pull-based metric collection, which integrates with tools like prometheus_client
to expose application metrics. Monitoring ensures proactive issue resolution, scalability, and compliance with service-level objectives (SLOs).
Example: Basic Prometheus Metrics in Flask
from flask import Flask
from prometheus_client import start_http_server, Counter
app = Flask(__name__)
# Define a Prometheus counter metric
request_count = Counter('flask_requests_total', 'Total number of Flask requests')
@app.route('/')
def index():
request_count.inc() # Increment counter on each request
return "Hello, Flask!"
if __name__ == '__main__':
start_http_server(8000) # Start Prometheus metrics server
app.run(debug=True, port=5000)
Output:
* Flask running on http://127.0.0.1:5000
* Prometheus metrics exposed on http://127.0.0.1:8000/metrics
(Metrics output includes: flask_requests_total 1.0)
Explanation:
prometheus_client
- Provides tools to define and expose metrics.start_http_server
- Runs a metrics endpoint for Prometheus scraping.Counter
- Tracks the total number of requests.
02. Key Monitoring Techniques
Monitoring Flask with Prometheus involves defining custom metrics, integrating with middleware, and visualizing data. These techniques provide comprehensive insights into application performance. The table below summarizes key techniques and their applications:
Technique | Description | Use Case |
---|---|---|
Custom Metrics | Define counters, gauges, histograms | Track requests, latency, errors |
Middleware Integration | Automatically collect request metrics | Monitor all endpoints |
Prometheus Configuration | Scrape Flask metrics | Centralized metric collection |
Grafana Dashboards | Visualize metrics | Real-time performance insights |
Alerting Rules | Notify on anomalies | Proactive issue detection |
2.1 Custom Metrics with Prometheus
Example: Counter, Gauge, and Histogram
from flask import Flask
from prometheus_client import start_http_server, Counter, Gauge, Histogram
app = Flask(__name__)
# Define metrics
request_count = Counter('flask_requests_total', 'Total requests', ['endpoint'])
active_users = Gauge('flask_active_users', 'Number of active users')
request_latency = Histogram('flask_request_latency_seconds', 'Request latency', ['endpoint'])
@app.route('/')
@request_latency.labels(endpoint='/').time()
def index():
request_count.labels(endpoint='/').inc()
active_users.inc()
response = "Hello, Flask!"
active_users.dec()
return response
if __name__ == '__main__':
start_http_server(8000)
app.run(debug=True, port=5000)
Output (http://127.0.0.1:8000/metrics):
flask_requests_total{endpoint="/"} 1.0
flask_active_users 0.0
flask_request_latency_seconds_bucket{endpoint="/",le="0.005"} 1.0
...
Explanation:
Counter
- Tracks total requests per endpoint.Gauge
- Monitors active users, increasing/decreasing as needed.Histogram
- Measures request latency distribution.
2.2 Middleware for Automatic Metrics
Example: Request Metrics Middleware
from flask import Flask
from prometheus_client import Counter, Histogram
from prometheus_client import make_wsgi_app
from werkzeug.middleware.dispatcher import DispatcherMiddleware
app = Flask(__name__)
# Define metrics
request_count = Counter('flask_requests_total', 'Total requests', ['method', 'endpoint'])
request_latency = Histogram('flask_request_latency_seconds', 'Request latency', ['endpoint'])
# Middleware to track metrics
@app.before_request
def before_request():
request.start_time = time.time()
@app.after_request
def after_request(response):
endpoint = request.endpoint or 'unknown'
request_count.labels(method=request.method, endpoint=endpoint).inc()
latency = time.time() - request.start_time
request_latency.labels(endpoint=endpoint).observe(latency)
return response
# Add Prometheus WSGI middleware
app.wsgi_app = DispatcherMiddleware(app.wsgi_app, {
'/metrics': make_wsgi_app()
})
@app.route('/')
def index():
return "Hello, Flask!"
if __name__ == '__main__':
app.run(debug=True, port=5000)
Output (http://127.0.0.1:5000/metrics):
flask_requests_total{method="GET",endpoint="index"} 1.0
flask_request_latency_seconds_sum{endpoint="index"} 0.002
...
Explanation:
before_request/after_request
- Tracks request metrics automatically.make_wsgi_app
- Exposes metrics at/metrics
.
2.3 Configuring Prometheus to Scrape Metrics
Example: Prometheus Configuration
# prometheus.yml
global:
scrape_interval: 15s
scrape_configs:
- job_name: 'flask_app'
static_configs:
- targets: ['localhost:5000']
Flask App:
from flask import Flask
from prometheus_client import make_wsgi_app
from werkzeug.middleware.dispatcher import DispatcherMiddleware
app = Flask(__name__)
app.wsgi_app = DispatcherMiddleware(app.wsgi_app, {
'/metrics': make_wsgi_app()
})
@app.route('/')
def index():
return "Monitored Flask App"
if __name__ == '__main__':
app.run(debug=True, port=5000)
Output (Prometheus Web UI: http://localhost:9090):
Metrics scraped from http://localhost:5000/metrics
Explanation:
prometheus.yml
- Configures Prometheus to scrape Flask’s/metrics
endpoint every 15 seconds.- Prometheus stores metrics for querying and alerting.
2.4 Visualizing Metrics with Grafana
Example: Grafana Dashboard Setup
Steps:
- Run Grafana:
docker run -d -p 3000:3000 grafana/grafana
- Add Prometheus as a data source in Grafana (URL:
http://localhost:9090
). - Create a dashboard with queries like
rate(flask_requests_total[5m])
.
Flask App (from 2.3): Exposes /metrics
.
Output (Grafana Web UI: http://localhost:3000):
Dashboard shows request rate, latency histograms, etc.
Explanation:
- Grafana visualizes Prometheus metrics in customizable dashboards.
- Queries like
rate()
show request rates over time.
2.5 Setting Up Alerting Rules
Example: Prometheus Alerting
# prometheus.yml
rule_files:
- 'alerts.yml'
scrape_configs:
- job_name: 'flask_app'
static_configs:
- targets: ['localhost:5000']
# alerts.yml
groups:
- name: flask_alerts
rules:
- alert: HighErrorRate
expr: rate(flask_requests_total{status="500"}[5m]) > 0.1
for: 2m
labels:
severity: critical
annotations:
summary: "High error rate detected"
description: "Error rate is {{ $value }} requests/sec"
Output (Prometheus Alerts):
Alert triggered if 500 errors exceed 0.1 requests/sec for 2 minutes
Explanation:
alerts.yml
- Defines rules to trigger alerts based on metrics.- Integrates with notification systems like Alertmanager for email/Slack alerts.
2.6 Incorrect Metrics Setup
Example: Missing Metrics Endpoint
from flask import Flask
from prometheus_client import Counter
app = Flask(__name__)
request_count = Counter('flask_requests_total', 'Total requests')
@app.route('/')
def index():
request_count.inc()
return "Hello, Flask!"
if __name__ == '__main__':
app.run(debug=True, port=5000)
Output:
* Running on http://127.0.0.1:5000
(Prometheus cannot scrape metrics; no /metrics endpoint)
Explanation:
- Missing
make_wsgi_app
orstart_http_server
prevents metric exposure. - Solution: Add a metrics endpoint or separate metrics server.
03. Effective Usage
3.1 Recommended Practices
- Use middleware to automate metric collection for all routes.
Example: Comprehensive Monitoring Setup
from flask import Flask, request
from prometheus_client import Counter, Histogram, make_wsgi_app
from werkzeug.middleware.dispatcher import DispatcherMiddleware
import time
app = Flask(__name__)
# Define metrics
request_count = Counter('flask_requests_total', 'Total requests', ['method', 'endpoint', 'status'])
request_latency = Histogram('flask_request_latency_seconds', 'Request latency', ['endpoint'])
# Middleware for metrics
@app.before_request
def before_request():
request.start_time = time.time()
@app.after_request
def after_request(response):
endpoint = request.endpoint or 'unknown'
status = str(response.status_code)
request_count.labels(method=request.method, endpoint=endpoint, status=status).inc()
latency = time.time() - request.start_time
request_latency.labels(endpoint=endpoint).observe(latency)
return response
# Metrics endpoint
app.wsgi_app = DispatcherMiddleware(app.wsgi_app, {
'/metrics': make_wsgi_app()
})
@app.route('/')
def index():
return "Monitored Flask App"
@app.route('/error')
def error():
return "Error", 500
if __name__ == '__main__':
app.run(debug=True, port=5000)
Prometheus Configuration (prometheus.yml):
scrape_configs:
- job_name: 'flask_app'
static_configs:
- targets: ['localhost:5000']
Output (http://localhost:5000/metrics):
flask_requests_total{method="GET",endpoint="index",status="200"} 1.0
flask_request_latency_seconds_sum{endpoint="index"} 0.001
- Automates metric collection for requests, latency, and status codes.
- Exposes metrics at
/metrics
for Prometheus scraping. - Ready for Grafana visualization and alerting.
3.2 Practices to Avoid
- Avoid exposing metrics endpoints publicly without authentication.
Example: Public Metrics Exposure
from flask import Flask
from prometheus_client import make_wsgi_app
from werkzeug.middleware.dispatcher import DispatcherMiddleware
app = Flask(__name__)
app.wsgi_app = DispatcherMiddleware(app.wsgi_app, {
'/metrics': make_wsgi_app()
})
@app.route('/')
def index():
return "Flask App"
if __name__ == '__main__':
app.run(debug=True, port=5000, host='0.0.0.0')
Output:
* Running on http://0.0.0.0:5000
(/metrics accessible publicly, exposing sensitive data)
- Public
/metrics
endpoint risks leaking application data. - Solution: Restrict access with firewall rules or authentication.
04. Common Use Cases
4.1 Monitoring API Performance
Track request latency and error rates for APIs.
Example: API Metrics
from flask import Flask, jsonify
from prometheus_client import Counter, Histogram, make_wsgi_app
from werkzeug.middleware.dispatcher import DispatcherMiddleware
import time
app = Flask(__name__)
request_count = Counter('api_requests_total', 'Total API requests', ['endpoint', 'status'])
request_latency = Histogram('api_request_latency_seconds', 'API latency', ['endpoint'])
@app.before_request
def before_request():
request.start_time = time.time()
@app.after_request
def after_request(response):
endpoint = request.endpoint or 'unknown'
status = str(response.status_code)
request_count.labels(endpoint=endpoint, status=status).inc()
latency = time.time() - request.start_time
request_latency.labels(endpoint=endpoint).observe(latency)
return response
app.wsgi_app = DispatcherMiddleware(app.wsgi_app, {
'/metrics': make_wsgi_app()
})
@app.route('/api/data')
def data():
return jsonify({'data': 'secure'})
if __name__ == '__main__':
app.run(debug=True, port=5000)
Output (http://localhost:5000/metrics):
api_requests_total{endpoint="data",status="200"} 1.0
api_request_latency_seconds_sum{endpoint="data"} 0.003
Explanation:
- Tracks API request counts and latency per endpoint.
- Useful for identifying slow or error-prone endpoints.
4.2 Tracking User Activity
Monitor user interactions for analytics and performance.
Example: User Activity Metrics
from flask import Flask, request
from prometheus_client import Counter, make_wsgi_app
from werkzeug.middleware.dispatcher import DispatcherMiddleware
app = Flask(__name__)
login_count = Counter('flask_logins_total', 'Total user logins', ['username'])
app.wsgi_app = DispatcherMiddleware(app.wsgi_app, {
'/metrics': make_wsgi_app()
})
@app.route('/login', methods=['POST'])
def login():
username = request.form.get('username', 'anonymous')
login_count.labels(username=username).inc()
return "Login recorded"
if __name__ == '__main__':
app.run(debug=True, port=5000)
Output (http://localhost:5000/metrics):
flask_logins_total{username="alice"} 1.0
Explanation:
- Tracks login events per user for activity analysis.
- Can be visualized in Grafana for user engagement insights.
Conclusion
Monitoring Flask applications with Prometheus provides deep insights into performance and reliability. Key takeaways:
- Use
prometheus_client
to define counters, gauges, and histograms for custom metrics. - Automate metric collection with middleware and expose via
/metrics
. - Configure Prometheus for scraping and Grafana for visualization.
- Set up alerting rules to detect issues proactively.
With these practices, you can build observable Flask applications that ensure high performance and reliability!
Comments
Post a Comment