#82: Monitoring in MLOps: Tools, Tipps und Best Practices aus der Praxis ~ Data Science Deep Dive Podcast

Wie behält man eigentlich den Überblick, wenn Data Science
Services in Produktion laufen? In dieser Folge sprechen Sebastian
und Michelle darüber, wie man einen sinnvollen Monitoring-Stack
aufsetzt – von Logs und Metriken bis hin zu Alerts und
Dashboards. Wir schauen uns Tools wie Prometheus, Grafana, Loki
und ELK an und klären, worin sie sich unterscheiden. Außerdem
geht's um Best Practices fürs Alerting, sinnvolle
Feedbackschleifen und die Frage, wann und wie man Monitoring in
den Entwicklungsprozess integriert.

**Zusammenfassung**

Ziel von Monitoring: schnelle Feedbackschleifen zwischen
Entwicklung und Produktion

Unterschied zwischen CI/CD und Monitoring, letztere liefert
Feedback nach dem Deployment

Planung des Monitorings idealerweise schon bei der
Architektur berücksichtigen

Überblick über Monitoring-Ziele: Services, Infrastruktur,
Daten, Modelle

Vergleich Cloud vs. Self-Hosted Monitoring (Aufwand,
Flexibilität, Kosten)

Wichtige Tools: Prometheus/Grafana/Loki, ELK-Stack,
Nagios/Icinga/Zabbix, Great Expectations, Redash/Metabase

Best Practices fürs Alerting: sinnvolle Schwellenwerte,
Vermeidung von "Alert Fatigue", klare Zuständigkeiten

Fazit: Monitoring braucht klare Ziele, sinnvolle Alerts und
gute Visualisierung, um echten Mehrwert zu liefern

**Links**

#23: Unsexy aber wichtig: Tests und Monitoring
https://www.podbean.com/ew/pb-vxp58-13f311a

Prometheus – Open-Source Monitoring-System:
https://prometheus.io

Grafana – Visualisierung von Metriken und Logs:
https://grafana.com

Loki – Log-Aggregation für Grafana:
https://grafana.com/oss/loki/

ELK Stack (Elasticsearch, Logstash, Kibana):
https://www.elastic.co/elastic-stack

Great Expectations – Datenvalidierung und Monitoring:
https://greatexpectations.io

Redash – SQL-basierte Dashboards und Visualisierungen:
https://redash.io

Metabase – Self-Service BI-Tool: https://www.metabase.com

Nagios – klassisches System-Monitoring-Tool:
https://www.nagios.org

Icinga – moderner Nagios-Fork: https://icinga.com

Zabbix – Monitoring-Plattform für Netzwerke & Server:
https://www.zabbix.com

Prometheus Alertmanager:
https://prometheus.io/docs/alerting/latest/alertmanager/

PagerDuty – Incident Response Management:
https://www.pagerduty.com

Fragen, Feedback oder Themenwünsche? Schreibt uns gern an:
podcast@inwt-statistics.de

#82: Monitoring in MLOps: Tools, Tipps und Best Practices aus der Praxis

Beschreibung

Weitere Episoden

#95: GitOps: Deployments mit Ruhepuls

#94: [PAIQ4] Predictive AI Quarterly

#93: Bayesianische Statistik: Vorwissen und Daten kombinieren

#92: Anomaly Detection von Produktbildern mit ClickHouse

#91: Software ohne Entwickler*innen? Wie AI Agents unsere Arbeit neu definieren

Kommentare (0)

Abonnenten

Bleibe beim Podcasting auf dem Laufenden

Anmelden mit