Thank you for sending your enquiry! One of our team members will contact you shortly.
Thank you for sending your booking! One of our team members will contact you shortly.
Course Outline
Introduction to Custom Operator Development
- Rationale for custom operators: Use cases and constraints.
- CANN runtime architecture and points of operator integration.
- Overview of TBE, TIK, and TVM within the Huawei AI ecosystem.
Low-Level Operator Programming with TIK
- Comprehension of the TIK programming model and its supported APIs.
- Memory management strategies and tiling approaches in TIK.
- Steps to create, compile, and register a custom operator with CANN.
Testing and Validation of Custom Operators
- Conducting unit and integration tests for operators within the graph.
- Troubleshooting kernel-level performance bottlenecks.
- Visualizing operator execution and buffer behaviors.
TVM-Driven Scheduling and Optimization
- Overview of TVM as a compiler for tensor operations.
- Crafting schedules for custom operators in TVM.
- Performing TVM tuning, benchmarking, and code generation for Ascend.
Integration with Frameworks and Models
- Registering custom operators for MindSpore and ONNX compatibility.
- Ensuring model integrity and analyzing fallback mechanisms.
- Enabling support for multi-operator graphs utilizing mixed precision.
Case Studies and Specialized Optimizations
- Case study: Achieving high-efficiency convolution for small input shapes.
- Case study: Memory-aware optimization of attention operators.
- Best practices for deploying custom operators across various devices.
Summary and Next Steps
Requirements
- Proficient understanding of AI model internals and operator-level computations.
- Practical experience with Python and Linux development environments.
- Familiarity with neural network compilers or graph-level optimization techniques.
Target Audience
- Compiler engineers engaged in AI toolchain development.
- Systems developers specializing in low-level AI optimization.
- Developers creating custom operators or targeting emerging AI workloads.
14 Hours