Thank you for sending your enquiry! One of our team members will contact you shortly.
Thank you for sending your booking! One of our team members will contact you shortly.
Course Outline
Introduction to AIOps with Open Source Tools
- Overview of AIOps concepts and benefits
- The role of Prometheus and Grafana in the observability stack
- Where Machine Learning fits into AIOps: predictive versus reactive analytics
Setting Up Prometheus and Grafana
- Installing and configuring Prometheus for time series data collection
- Creating dashboards in Grafana using real-time metrics
- Exploring exporters, relabeling, and service discovery
Data Preprocessing for Machine Learning
- Extracting and transforming Prometheus metrics
- Preparing datasets for anomaly detection and forecasting
- Utilizing Grafana’s transformations or Python pipelines
Applying Machine Learning for Anomaly Detection
- Foundational ML models for outlier detection (e.g., Isolation Forest, One-Class SVM)
- Training and evaluating models on time series data
- Visualizing anomalies within Grafana dashboards
Forecasting Metrics with Machine Learning
- Building basic forecasting models (Introduction to ARIMA, Prophet, LSTM)
- Predicting system load or resource usage
- Leveraging predictions for proactive alerting and scaling decisions
Integrating Machine Learning with Alerting and Automation
- Defining alert rules based on ML output or dynamic thresholds
- Configuring Alertmanager and notification routing
- Triggering scripts or automation workflows upon anomaly detection
Scaling and Operationalizing AIOps
- Integrating external observability tools (e.g., ELK stack, Moogsoft, Dynatrace)
- Operationalizing ML models within observability pipelines
- Best practices for deploying AIOps at scale
Summary and Next Steps
Requirements
- A solid understanding of system monitoring and observability principles
- Prior experience using Grafana or Prometheus
- Proficiency in Python and knowledge of fundamental machine learning concepts
Target Audience
- Observability engineers
- Infrastructure and DevOps teams
- Monitoring platform architects and Site Reliability Engineers (SREs)
14 Hours