Thank you for sending your enquiry! One of our team members will contact you shortly.
Thank you for sending your booking! One of our team members will contact you shortly.
Course Outline
Foundations of Agentic Systems in Production
- Agentic architectures: loops, tools, memory, and orchestration layers
- Agent lifecycle: development, deployment, and continuous operation
- Challenges associated with managing agents at production scale
Infrastructure and Deployment Models
- Deploying agents in containerized and cloud environments
- Scaling patterns: horizontal vs. vertical scaling, concurrency, and throttling
- Multi-agent orchestration and workload balancing
Monitoring and Observability
- Key metrics: latency, success rate, memory usage, and agent call depth
- Tracing agent activity and call graphs
- Implementing observability using Prometheus, OpenTelemetry, and Grafana
Logging, Auditing, and Compliance
- Centralized logging and structured event collection
- Compliance and auditability within agentic workflows
- Designing audit trails and replay mechanisms for debugging purposes
Performance Tuning and Resource Optimization
- Reducing inference overhead and optimizing agent orchestration cycles
- Model caching and lightweight embeddings for faster retrieval
- Load testing and stress scenarios for AI pipelines
Cost Control and Governance
- Understanding cost drivers for agents: API calls, memory, compute, and external integrations
- Tracking agent-level costs and implementing chargeback models
- Automation policies to prevent agent sprawl and idle resource consumption
CI/CD and Rollout Strategies for Agents
- Integrating agent pipelines into CI/CD systems
- Testing, versioning, and rollback strategies for iterative agent updates
- Progressive rollouts and safe deployment mechanisms
Failure Recovery and Reliability Engineering
- Designing for fault tolerance and graceful degradation
- Retry, timeout, and circuit breaker patterns for ensuring agent reliability
- Incident response and post-mortem frameworks for AI operations
Capstone Project
- Building and deploying an agentic AI system with comprehensive monitoring and cost tracking
- Simulating load, measuring performance, and optimizing resource usage
- Presenting the final architecture and monitoring dashboard to peers
Summary and Next Steps
Requirements
- Solid understanding of MLOps and production machine learning systems
- Experience with containerized deployments (Docker/Kubernetes)
- Familiarity with cloud cost optimization and observability tools
Target Audience
- MLOps engineers
- Site Reliability Engineers (SREs)
- Engineering managers overseeing AI infrastructure
21 Hours
Testimonials (3)
The trainer is patient and very helpful. He knows the topic well.
CLIFFORD TABARES - Universal Leaf Philippines, Inc.
Course - Agentic AI for Business Automation: Use Cases & Integration
Good mixvof knowledge and practice
Ion Mironescu - Facultatea S.A.I.A.P.M.
Course - Agentic AI for Enterprise Applications
The mix of theory and practice and of high level and low level perspectives