EXO: End-to-End Local AI Cluster Deployment Training Course
EXO is an open-source framework that links Apple Silicon devices into a distributed AI cluster, facilitating the local inference of state-of-the-art models that exceed the memory capacity of a single device.
This instructor-led, live training (available online or onsite) is designed for system administrators and DevOps engineers looking to deploy, configure, and manage EXO clusters for private Large Language Model (LLM) inference across multiple Apple Silicon or Linux nodes.
Upon completion of this training, participants will be able to:
- Install and configure EXO on both macOS and Linux nodes.
- Facilitate automatic device discovery to construct multi-node clusters.
- Activate and validate RDMA over Thunderbolt 5 to ensure ultra-low-latency communication between devices.
- Deploy frontier models (such as DeepSeek, Qwen, and Llama) across the clustered devices.
- Monitor cluster health and resolve common deployment issues.
Course Format
- Interactive lectures and discussions.
- Extensive exercises and practical sessions.
- Hands-on implementation within a live-lab environment.
Customization Options
- To request customized training, please contact us to arrange.
Course Outline
Introduction to EXO and Local AI Clustering
- Overview of the EXO framework and the exo-explore ecosystem.
- Comparison of centralized cloud inference versus distributed local inference.
- Architecture: libp2p device discovery, MLX backend, dashboard, and API layers.
- Hardware requirements: Apple Silicon (M3 Ultra, M4 Pro/Max), Thunderbolt 5, and shared storage.
Installing EXO on macOS
- Setting up Xcode, Metal ToolChain, and macOS prerequisites.
- Installing uv, Node.js, and the Rust nightly toolchain.
- Installing the pinned macmon fork for Apple Silicon monitoring.
- Cloning the repository and building the dashboard using npm.
- Running EXO from source and verifying the localhost:52415 dashboard.
Installing EXO on Linux
- Installing dependencies via apt or Homebrew on Linux.
- Configuring uv, Node.js 18+, and Rust nightly.
- Building the dashboard and running EXO in CPU-only mode.
- Directory layout: XDG Base Directory paths for configuration, data, cache, and logs.
Automatic Device Discovery and Cluster Formation
- Understanding libp2p-based auto-discovery across local networks.
- Configuring custom namespaces with EXO_LIBP2P_NAMESPACE for cluster isolation.
- Verifying node membership in the dashboard cluster view.
- Handling discovery failures and network segmentation issues.
Enabling RDMA over Thunderbolt 5
- Understanding RDMA architecture and the claim of 99 percent latency reduction.
- Enabling RDMA in macOS Recovery mode using rdma_ctl.
- Cable requirements and port topology constraints on Mac Studio.
- Ensuring macOS versions match across all cluster nodes.
- Troubleshooting RDMA discovery and DHCP configuration.
Deploying Frontier Models
- Using the dashboard to load and shard DeepSeek v3.1, Qwen3-235B, and Llama family models.
- Previewing instance placements via the /instance/previews API endpoint.
- Creating model instances using pipeline or tensor-parallel sharding.
- Configuring custom model cards from the HuggingFace hub.
Monitoring and Troubleshooting
- Reading EXO logs and understanding distributed tracing.
- Interpreting cluster health within the dashboard cluster view.
- Diagnosing worker node failures and reconnection behavior.
- Utilizing EXO_TRACING_ENABLED for performance bottleneck analysis.
Cluster Maintenance and Updates
- Updating EXO binaries and performing dashboard rebuild procedures.
- Migrating model caches and managing pre-downloaded models over NFS.
- Gracefully removing nodes and rebalancing workloads.
Requirements
- A solid understanding of networking fundamentals (IP addressing, subnetting, firewalls).
- Experience in macOS or Linux command-line administration.
- Familiarity with Python package management (pip/uv) and Node.js tooling.
Audience
- System administrators.
- DevOps engineers.
- AI infrastructure architects tasked with on-premise LLM deployment.
Open Training Courses require 5+ participants.
EXO: End-to-End Local AI Cluster Deployment Training Course - Booking
EXO: End-to-End Local AI Cluster Deployment Training Course - Enquiry
EXO: End-to-End Local AI Cluster Deployment - Consultancy Enquiry
Upcoming Courses
Related Courses
Advanced LangGraph: Optimization, Debugging, and Monitoring Complex Graphs
35 HoursLangGraph is a framework designed for creating stateful, multi-actor LLM applications through composable graphs that maintain persistent state and provide execution control.
This instructor-led, live training (available online or onsite) targets advanced AI platform engineers, AI DevOps specialists, and ML architects who aim to optimize, debug, monitor, and manage production-grade LangGraph systems.
By the conclusion of this training, participants will be equipped to:
- Design and optimize complex LangGraph topologies for enhanced speed, cost-efficiency, and scalability.
- Engineer reliability through retries, timeouts, idempotency, and checkpoint-based recovery mechanisms.
- Debug and trace graph executions, inspect state variables, and systematically reproduce production issues.
- Instrument graphs with logs, metrics, and traces; deploy to production; and monitor SLAs and costs.
Format of the Course
- Interactive lecture and discussion.
- Extensive exercises and practical application.
- Hands-on implementation in a live-lab environment.
Course Customization Options
- To request a customized training session for this course, please contact us to arrange details.
Building Coding Agents with Devstral: From Agent Design to Tooling
14 HoursDevstral is an open-source framework engineered for the creation and execution of coding agents capable of interacting with code repositories, developer utilities, and APIs to boost engineering efficiency.
This instructor-led, live training (available online or on-site) targets intermediate to advanced ML engineers, developer-tooling teams, and Site Reliability Engineers (SREs) who aim to design, implement, and optimize coding agents using Devstral.
Upon completing this training, participants will be able to:
- Establish and configure the Devstral environment for coding agent development.
- Design agentic workflows for exploring and modifying codebases.
- Integrate coding agents with developer tools and APIs.
- Apply best practices for secure and efficient agent deployment.
Course Format
- Interactive lectures and discussions.
- Extensive exercises and practical application.
- Hands-on implementation in a live laboratory environment.
Customization Options
- To request tailored training for this course, please contact us to arrange the details.
Open-Source Model Ops: Self-Hosting, Fine-Tuning and Governance with Devstral & Mistral Models
14 HoursDevstral and Mistral models are open-source AI technologies engineered for flexible deployment, fine-tuning, and scalable integration.
This instructor-led live training (available online or onsite) is tailored for intermediate to advanced ML engineers, platform teams, and research engineers who aim to self-host, fine-tune, and govern Mistral and Devstral models within production environments.
Upon completion of this training, participants will be capable of:
- Setting up and configuring self-hosted environments for Mistral and Devstral models.
- Applying fine-tuning techniques to enhance domain-specific performance.
- Implementing versioning, monitoring, and lifecycle governance strategies.
- Ensuring security, compliance, and responsible usage of open-source models.
Course Format
- Interactive lectures and discussions.
- Hands-on exercises focused on self-hosting and fine-tuning.
- Live-lab implementation of governance and monitoring pipelines.
Customization Options
- To request tailored training for this course, please contact us to arrange.
Fiji: Image Processing for Biotechnology and Toxicology
14 HoursThis instructor-led, live training in Romania (online or onsite) is aimed at beginner-level to intermediate-level researchers and laboratory professionals who wish to process and analyze images related to histological tissues, blood cells, algae, and other biological samples.
By the end of this training, participants will be able to:
- Navigate the Fiji interface and utilize ImageJ’s core functions.
- Preprocess and enhance scientific images for better analysis.
- Analyze images quantitatively, including cell counting and area measurement.
- Automate repetitive tasks using macros and plugins.
- Customize workflows for specific image analysis needs in biological research.
LangGraph Applications in Finance
35 HoursLangGraph serves as a framework for constructing stateful, multi-agent LLM applications using composable graphs that maintain persistent state and provide precise control over execution flow.
This instructor-led live training, available online or on-site, targets intermediate to advanced professionals aiming to design, implement, and manage LangGraph-based financial solutions with robust governance, observability, and regulatory compliance.
Upon completion of this training, participants will be able to:
- Design finance-specific LangGraph workflows that align with regulatory and audit requirements.
- Integrate financial data standards and ontologies into graph states and associated tools.
- Implement reliability, safety measures, and human-in-the-loop controls for critical operations.
- Deploy, monitor, and optimize LangGraph systems to ensure high performance, cost efficiency, and adherence to SLAs.
Course Format
- Interactive lectures and discussions.
- Extensive exercises and practical practice.
- Hands-on implementation within a live-lab environment.
Customization Options
- To request customized training for this course, please contact us to arrange.
LangGraph Foundations: Graph-Based LLM Prompting and Chaining
14 HoursLangGraph is a framework designed for constructing graph-structured LLM applications that facilitate planning, branching, tool utilization, memory management, and controlled execution.
This instructor-led, live training session (available online or onsite) is tailored for beginner-level developers, prompt engineers, and data practitioners who aim to design and implement reliable, multi-step LLM workflows using LangGraph.
By the conclusion of this training, participants will be capable of:
- Describing core LangGraph concepts (nodes, edges, state) and understanding their appropriate use cases.
- Constructing prompt chains that support branching, tool invocation, and memory retention.
- Integrating retrieval mechanisms and external APIs into graph-based workflows.
- Testing, debugging, and evaluating LangGraph applications to ensure reliability and safety.
Course Format
- Interactive lectures and facilitated discussions.
- Guided laboratory exercises and code walkthroughs within a sandbox environment.
- Scenario-based exercises focused on design, testing, and evaluation.
Course Customization Options
- To request a customized training for this course, please contact us to make arrangements.
LangGraph in Healthcare: Workflow Orchestration for Regulated Environments
35 HoursLangGraph empowers stateful, multi-actor workflows driven by LLMs, offering precise control over execution paths and state persistence. For the healthcare sector, these capabilities are essential for ensuring compliance, enabling interoperability, and developing decision-support systems that seamlessly integrate with medical workflows.
This instructor-led, live training—available either online or on-site—is designed for intermediate to advanced professionals looking to design, implement, and manage LangGraph-based healthcare solutions while navigating regulatory, ethical, and operational challenges.
Upon completion of this training, participants will be capable of:
- Designing healthcare-specific LangGraph workflows that prioritize compliance and auditability.
- Integrating LangGraph applications with medical ontologies and standards (FHIR, SNOMED CT, ICD).
- Applying best practices for reliability, traceability, and explainability within sensitive environments.
- Deploying, monitoring, and validating LangGraph applications in healthcare production settings.
Format of the Course
- Interactive lectures and discussions.
- Hands-on exercises based on real-world case studies.
- Implementation practice within a live-lab environment.
Course Customization Options
- To request customized training for this course, please contact us to arrange.
LangGraph for Legal Applications
35 HoursLangGraph serves as a framework for developing stateful, multi-actor LLM applications through composable graphs that maintain persistent state and offer precise execution control.
This instructor-led live training, available online or onsite, targets intermediate to advanced professionals seeking to design, implement, and manage LangGraph-based legal solutions with robust compliance, traceability, and governance controls.
Upon completion, participants will be capable of:
- Designing legal-specific LangGraph workflows that ensure auditability and regulatory compliance.
- Integrating legal ontologies and document standards into graph state and processing logic.
- Implementing guardrails, human-in-the-loop approvals, and traceable decision paths.
- Deploying, monitoring, and maintaining LangGraph services in production environments with observability and cost management.
Course Format
- Interactive lectures and discussions.
- Extensive exercises and practical practice.
- Hands-on implementation within a live-lab environment.
Customization Options
- For customized training requests, please contact us to arrange.
Building Dynamic Workflows with LangGraph and LLM Agents
14 HoursLangGraph serves as a framework designed for composing graph-structured LLM workflows that facilitate branching, tool utilization, memory management, and controllable execution.
This instructor-led, live training (available online or onsite) targets intermediate-level engineers and product teams aiming to merge LangGraph’s graph logic with LLM agent loops to create dynamic, context-aware applications, such as customer support agents, decision trees, and information retrieval systems.
Upon completing this training, participants will be capable of:
- Designing graph-based workflows that coordinate LLM agents, tools, and memory.
- Implementing conditional routing, retries, and fallback mechanisms for robust execution.
- Integrating retrieval, APIs, and structured outputs into agent loops.
- Evaluating, monitoring, and hardening agent behavior to ensure reliability and safety.
Format of the Course
- Interactive lectures and facilitated discussions.
- Guided labs and code walkthroughs within a sandbox environment.
- Scenario-based design exercises and peer reviews.
Course Customization Options
- To request a customized training for this course, please contact us to arrange.
LangGraph for Marketing Automation
14 HoursLangGraph is a graph-based orchestration framework that enables conditional, multi-step LLM and tool workflows, ideal for automating and personalizing content pipelines.
This instructor-led, live training (online or onsite) is aimed at intermediate-level marketers, content strategists, and automation developers who wish to implement dynamic, branching email campaigns and content generation pipelines using LangGraph.
By the end of this training, participants will be able to:
- Design graph-structured content and email workflows with conditional logic.
- Integrate LLMs, APIs, and data sources for automated personalization.
- Manage state, memory, and context across multi-step campaigns.
- Evaluate, monitor, and optimize workflow performance and delivery outcomes.
Format of the Course
- Interactive lectures and group discussions.
- Hands-on labs implementing email workflows and content pipelines.
- Scenario-based exercises on personalization, segmentation, and branching logic.
Course Customization Options
- To request a customized training for this course, please contact us to arrange.
Le Chat Enterprise: Private ChatOps, Integrations & Admin Controls
14 HoursLe Chat Enterprise is a private ChatOps solution that provides secure, customizable, and governed conversational AI capabilities for organizations, with support for RBAC, SSO, connectors, and enterprise app integrations.
This instructor-led, live training (online or onsite) is aimed at intermediate-level product managers, IT leads, solution engineers, and security/compliance teams who wish to deploy, configure, and govern Le Chat Enterprise in enterprise environments.
By the end of this training, participants will be able to:
- Set up and configure Le Chat Enterprise for secure deployments.
- Enable RBAC, SSO, and compliance-driven controls.
- Integrate Le Chat with enterprise applications and data stores.
- Design and implement governance and admin playbooks for ChatOps.
Format of the Course
- Interactive lecture and discussion.
- Lots of exercises and practice.
- Hands-on implementation in a live-lab environment.
Course Customization Options
- To request a customized training for this course, please contact us to arrange.
Cost-Effective LLM Architectures: Mistral at Scale (Performance / Cost Engineering)
14 HoursMistral represents a high-performance suite of large language models, specifically engineered for scalable and cost-effective production deployments.
This instructor-led training, available both online and onsite, targets advanced infrastructure engineers, cloud architects, and MLOps leaders seeking to design, deploy, and optimize Mistral-based architectures to achieve peak throughput while minimizing costs.
Upon completing this training, participants will be equipped to:
- Execute scalable deployment patterns for Mistral Medium 3.
- Utilize batching, quantization, and efficient serving strategies.
- Reduce inference expenses without compromising performance.
- Architect production-ready serving topologies tailored for enterprise workloads.
Course Format
- Interactive lectures and discussions.
- Extensive exercises and practical application.
- Hands-on implementation within a live-lab environment.
Customization Options
- For tailored training on this topic, please contact us to arrange.
Productizing Conversational Assistants with Mistral Connectors & Integrations
14 HoursMistral AI offers an open-source AI platform that empowers teams to develop and embed conversational assistants into both enterprise operations and customer-facing workflows.
This instructor-led live training, available either online or onsite, targets beginner to intermediate product managers, full-stack developers, and integration engineers looking to design, integrate, and commercialize conversational assistants using Mistral connectors and integrations.
Upon completion of this training, participants will be able to:
- Connect Mistral conversational models with enterprise and SaaS connectors.
- Implement retrieval-augmented generation (RAG) to ensure grounded responses.
- Create UX patterns for both internal and external chat assistants.
- Deploy assistants into product workflows for practical, real-world applications.
Course Format
- Interactive lectures and discussions.
- Practical integration exercises.
- Live lab sessions for developing conversational assistants.
Course Customization Options
- To arrange a customized training session for this course, please contact us.
Enterprise-Grade Deployments with Mistral Medium 3
14 HoursMistral Medium 3 is a high-performance, multimodal large language model engineered for production-grade deployment within enterprise environments.
This instructor-led live training, available either online or on-site, targets intermediate to advanced AI/ML engineers, platform architects, and MLOps teams looking to deploy, optimize, and secure Mistral Medium 3 for enterprise use cases.
Upon completing this training, participants will be able to:
- Deploy Mistral Medium 3 via API and self-hosted solutions.
- Optimize inference performance and associated costs.
- Implement multimodal use cases utilizing Mistral Medium 3.
- Apply security and compliance best practices suitable for enterprise environments.
Course Format
- Interactive lectures and discussions.
- Extensive exercises and practice sessions.
- Hands-on implementation within a live lab environment.
Customization Options
- To request customized training for this course, please contact us to arrange details.
Mistral for Responsible AI: Privacy, Data Residency & Enterprise Controls
14 HoursMistral AI offers an open, enterprise-ready AI platform designed to facilitate secure, compliant, and responsible AI deployment.
This instructor-led training, available online or onsite, is designed for intermediate-level compliance leads, security architects, and legal/operations stakeholders seeking to implement responsible AI practices with Mistral by leveraging its privacy, data residency, and enterprise control capabilities.
Upon completing this training, participants will be able to:
- Deploy privacy-preserving techniques within Mistral environments.
- Apply data residency strategies to satisfy regulatory requirements.
- Establish enterprise-grade controls, including RBAC, SSO, and audit logging.
- Evaluate vendor and deployment options to ensure compliance alignment.
Course Format
- Interactive lectures and discussions.
- Case studies and exercises focused on compliance.
- Hands-on implementation of enterprise AI controls.
Customization Options
- To request a customized version of this course, please contact us to arrange.