Mulțumim pentru trimiterea solicitării! Un membru al echipei noastre vă va contacta în curând.
Mulțumim pentru trimiterea rezervării! Un membru al echipei noastre vă va contacta în curând.
Schița de curs
AI Sovereignty and LLM Local Deployment
- Risks of cloud LLMs: data retention, training on inputs, foreign jurisdiction.
- Ollama architecture: model server, registry, and OpenAI-compatible API.
- Comparison with vLLM, llama.cpp, and Text Generation Inference.
- Model licensing: Llama, Mistral, Qwen, and Gemma terms.
Installation and Hardware Setup
- Installing Ollama on Linux with CUDA and ROCm support.
- CPU-only fallback and AVX/AVX2 optimization.
- Docker deployment and persistent volume mapping.
- Multi-GPU setup and VRAM allocation strategies.
Model Management
- Pulling models from the Ollama registry: ollama pull llama3.
- Importing GGUF models from HuggingFace and TheBloke.
- Quantization levels: Q4_K_M, Q5_K_M, Q8_0 tradeoffs.
- Model switching and concurrent model loading limits.
Custom Modelfiles
- Writing Modelfile syntax: FROM, PARAMETER, SYSTEM, TEMPLATE.
- Temperature, top_p, and repeat_penalty tuning.
- System prompt engineering for role-specific behavior.
- Creating and publishing custom models to local registry.
API Integration
- OpenAI-compatible /v1/chat/completions endpoint.
- Streaming responses and JSON mode.
- Integrating with LangChain, LlamaIndex, and custom apps.
- Authentication and rate limiting with reverse proxy.
Performance Optimization
- Context window sizing and KV cache management.
- Batch inference and parallel request handling.
- CPU thread allocation and NUMA awareness.
- Monitoring GPU utilization and memory pressure.
Security and Compliance
- Network isolation for model serving endpoints.
- Input filtering and output moderation pipelines.
- Audit logging of prompts and completions.
- Model provenance and hash verification.
Cerințe
- Intermediate Linux and container administration.
- Understanding of machine learning and transformer models at high level.
- Familiarity with REST APIs and JSON.
Audience
- AI engineers and developers replacing cloud LLM APIs.
- Organizations with data sensitivity preventing cloud model usage.
- Government and defense teams requiring air-gapped language models.
14 Ore