Migrating CUDA Applications to Chinese GPU Architectures Training Course
Chinese GPU architectures, including Huawei Ascend, Biren, and Cambricon MLUs, provide alternatives to CUDA that are specifically designed for the local AI and high-performance computing (HPC) markets.
This instructor-led live training, available both online and onsite, is designed for advanced-level GPU programmers and infrastructure specialists who aim to migrate and optimize their existing CUDA applications for deployment on Chinese hardware platforms.
Upon completion of this training, participants will be capable of:
- Evaluating the compatibility of existing CUDA workloads with Chinese chip alternatives.
- Porting CUDA codebases to Huawei CANN, Biren SDK, and Cambricon BANGPy environments.
- Comparing performance across platforms and identifying key optimization areas.
- Addressing practical challenges related to cross-architecture support and deployment.
Course Format
- Interactive lectures and discussions.
- Hands-on labs involving code translation and performance comparisons.
- Guided exercises focused on multi-GPU adaptation strategies.
Course Customization Options
- To request customized training for this course tailored to your specific platform or CUDA project, please contact us to arrange.
Course Outline
Overview of the Chinese AI GPU Ecosystem
- Comparison of Huawei Ascend, Biren, and Cambricon MLU
- Comparison between CUDA and CANN, Biren SDK, and BANGPy models
- Industry trends and vendor ecosystems
Preparing for Migration
- Assessing your CUDA codebase
- Identifying target platforms and SDK versions
- Installing toolchains and setting up the environment
Code Translation Techniques
- Porting CUDA memory access and kernel logic
- Mapping compute grid and thread models
- Options for automated versus manual translation
Platform-Specific Implementations
- Utilizing Huawei CANN operators and custom kernels
- Understanding the Biren SDK conversion pipeline
- Rebuilding models with BANGPy (Cambricon)
Cross-Platform Testing and Optimization
- Profiling execution on each target platform
- Comparing memory tuning and parallel execution
- Performance tracking and iterative improvement
Managing Mixed GPU Environments
- Hybrid deployments involving multiple architectures
- Fallback strategies and device detection
- Implementing abstraction layers for code maintainability
Case Studies and Best Practices
- Porting vision and NLP models to Ascend or Cambricon
- Retrofitting inference pipelines on Biren clusters
- Handling version mismatches and API gaps
Summary and Next Steps
Requirements
- Experience in programming with CUDA or GPU-based applications
- Understanding of GPU memory models and compute kernels
- Familiarity with AI model deployment or acceleration workflows
Audience
- GPU programmers
- System architects
- Porting specialists
Need help picking the right course?
Migrating CUDA Applications to Chinese GPU Architectures Training Course - Enquiry
Upcoming Courses
Related Courses
Developing AI Applications with Huawei Ascend and CANN
21 HoursHuawei Ascend represents a suite of AI processors engineered to deliver high-performance capabilities for both inference and training tasks.
This instructor-led live training session, available either online or onsite, is designed for AI engineers and data scientists at an intermediate level who aim to develop and refine neural network models utilizing Huawei’s Ascend platform alongside the CANN toolkit.
Upon completion of this training, participants will be equipped to:
- Establish and configure the CANN development environment.
- Create AI applications leveraging MindSpore and CloudMatrix workflows.
- Enhance performance on Ascend NPUs through the use of custom operators and tiling techniques.
- Deploy models across edge or cloud infrastructure.
Course Delivery Format
- Engaging lectures paired with interactive discussions.
- Practical application of Huawei Ascend and the CANN toolkit within sample projects.
- Guided exercises targeting model construction, training, and deployment.
Customization Opportunities
- For tailored training based on your specific infrastructure or datasets, please reach out to us to make arrangements.
Deploying AI Models with CANN and Ascend AI Processors
14 HoursCANN (Compute Architecture for Neural Networks) serves as Huawei's comprehensive AI compute stack, designed for deploying and optimizing AI models on Ascend AI processors.
This instructor-led, live training session, available both online and onsite, targets intermediate-level AI developers and engineers seeking to efficiently deploy trained AI models onto Huawei Ascend hardware. The curriculum utilizes the CANN toolkit alongside popular frameworks such as MindSpore, TensorFlow, and PyTorch.
Upon completion of this training, participants will be able to:
- Comprehend the CANN architecture and its critical function within the AI deployment pipeline.
- Convert and adapt models from leading frameworks into formats compatible with Ascend devices.
- Leverage tools such as ATC, OM model conversion, and MindSpore for both edge and cloud inference applications.
- Identify deployment challenges and optimize performance on Ascend hardware.
Course Format
- Interactive lectures combined with live demonstrations.
- Practical lab exercises utilizing CANN tools, Ascend simulators, or physical devices.
- Real-world deployment scenarios based on existing AI models.
Customization Options
- For personalized training requirements, please contact us to arrange a customized session.
AI Inference and Deployment with CloudMatrix
21 HoursCloudMatrix represents Huawei’s integrated platform for AI development and deployment, specifically engineered to facilitate scalable, production-ready inference pipelines.
This instructor-led live training, available both online and onsite, is designed for beginner to intermediate AI professionals aiming to deploy and oversee AI models utilizing the CloudMatrix platform, enhanced by CANN and MindSpore integration.
Upon completion of this training, participants will gain the ability to:
- Leverage CloudMatrix for model packaging, deployment, and serving.
- Convert and optimize models for Ascend chipsets.
- Establish pipelines for both real-time and batch inference tasks.
- Monitor deployments and optimize performance within production environments.
Course Format
- Interactive lectures and discussions.
- Practical application of CloudMatrix through real-world deployment scenarios.
- Guided exercises concentrating on conversion, optimization, and scaling.
Course Customization Options
- To arrange customized training for this course tailored to your specific AI infrastructure or cloud environment, please get in touch with us.
GPU Programming on Biren AI Accelerators
21 HoursBiren AI Accelerators are high-performance GPUs designed for AI and HPC workloads with support for large-scale training and inference.
This instructor-led, live training (online or onsite) is aimed at intermediate-level to advanced-level developers who wish to program and optimize applications using Biren’s proprietary GPU stack, with practical comparisons to CUDA-based environments.
By the end of this training, participants will be able to:
- Understand Biren GPU architecture and memory hierarchy.
- Set up the development environment and use Biren’s programming model.
- Translate and optimize CUDA-style code for Biren platforms.
- Apply performance tuning and debugging techniques.
Format of the Course
- Interactive lecture and discussion.
- Hands-on use of Biren SDK in sample GPU workloads.
- Guided exercises focused on porting and performance tuning.
Course Customization Options
- To request a customized training for this course based on your application stack or integration needs, please contact us to arrange.
Cambricon MLU Development with BANGPy and Neuware
21 HoursCambricon MLUs (Machine Learning Units) are specialized AI chips designed to optimize both inference and training tasks in edge computing and data center environments.
This instructor-led live training, available online or onsite, is tailored for intermediate-level developers looking to build and deploy AI models utilizing the BANGPy framework and Neuware SDK on Cambricon MLU hardware.
Upon completing this training, participants will be able to:
- Set up and configure development environments for both BANGPy and Neuware.
- Develop and optimize Python- and C++-based models tailored for Cambricon MLUs.
- Deploy models to edge and data center devices operating on the Neuware runtime.
- Integrate machine learning workflows with acceleration features specific to MLU.
Course Format
- Interactive lectures and discussions.
- Practical application of BANGPy and Neuware for development and deployment.
- Guided exercises focusing on optimization, integration, and testing.
Customization Options
- To arrange customized training for this course based on your specific Cambricon device model or use case, please contact us.
Introduction to CANN for AI Framework Developers
7 HoursCANN (Compute Architecture for Neural Networks) serves as Huawei’s comprehensive AI computing toolkit, designed to compile, optimize, and deploy artificial intelligence models on Ascend AI processors.
This instructor-led live training, available in online or onsite formats, is tailored for beginner-level AI developers. It provides a clear understanding of how CANN integrates into the end-to-end model lifecycle—from training through to deployment—and demonstrates its compatibility with major frameworks such as MindSpore, TensorFlow, and PyTorch.
Upon completing this training, participants will be equipped to:
- Grasp the fundamental purpose and architectural design of the CANN toolkit.
- Configure a development environment utilizing both CANN and MindSpore.
- Successfully convert and deploy a basic AI model onto Ascend hardware.
- Acquire the foundational knowledge necessary for future initiatives involving CANN optimization or integration.
Course Format
- Engaging lectures combined with interactive discussions.
- Practical, hands-on labs focused on simple model deployment.
- Detailed, step-by-step walkthroughs of the CANN toolchain and key integration points.
Customization Options
- For organizations seeking tailored training solutions, please reach out to us to arrange a customized session.
CANN for Edge AI Deployment
14 HoursHuawei's Ascend CANN toolkit empowers powerful AI inference on edge devices like the Ascend 310. It provides the essential tools needed to compile, optimize, and deploy models in environments where computing power and memory are limited.
This instructor-led, live training (available online or onsite) is designed for intermediate AI developers and integrators who want to deploy and optimize models on Ascend edge devices using the CANN toolchain.
By the end of this training, participants will be able to:
- Prepare and convert AI models for the Ascend 310 using CANN tools.
- Build lightweight inference pipelines using MindSpore Lite and AscendCL.
- Optimize model performance for environments with constrained compute and memory.
- Deploy and monitor AI applications in real-world edge scenarios.
Course Format
- Interactive lectures and demonstrations.
- Hands-on labs featuring edge-specific models and scenarios.
- Live deployment examples on virtual or physical edge hardware.
Customization Options
- To request customized training for this course, please contact us to arrange.
Understanding Huawei’s AI Compute Stack: From CANN to MindSpore
14 HoursHuawei’s comprehensive AI ecosystem, ranging from the foundational CANN SDK to the advanced MindSpore framework, provides a seamlessly integrated environment for developing and deploying AI solutions, specifically optimized for Ascend hardware.
This instructor-led training session, available online or onsite, is designed for technical professionals ranging from beginner to intermediate levels who aim to grasp the synergy between CANN and MindSpore components to inform AI lifecycle management and infrastructure strategies.
Upon completing this course, participants will be equipped to:
- Grasp the layered architecture of Huawei’s AI compute stack.
- Recognize how CANN facilitates model optimization and hardware-level deployment.
- Assess the MindSpore framework and its toolchain in comparison to industry standards.
- Strategically position Huawei's AI stack within enterprise or cloud/on-premises settings.
Course Format
- Interactive lectures and guided discussions.
- Live system demonstrations and case-based walkthroughs.
- Optional guided labs covering the model flow from MindSpore to CANN.
Customization Options
- To arrange a tailored training session for this course, please reach out to us.
Optimizing Neural Network Performance with CANN SDK
14 HoursThe CANN SDK (Compute Architecture for Neural Networks) serves as Huawei’s foundational AI compute platform, empowering developers to fine-tune and maximize the performance of neural networks deployed on Ascend AI processors.
This instructor-led live training session, available both online and onsite, is tailored for advanced AI developers and system engineers aiming to boost inference performance through CANN’s sophisticated toolset. Key areas include the Graph Engine, TIK, and custom operator development.
Upon completion of this training, participants will be equipped to:
- Comprehend the runtime architecture and performance lifecycle within CANN.
- Leverage profiling tools and the Graph Engine for detailed performance analysis and optimization.
- Develop and refine custom operators utilizing TIK and TVM.
- Address memory bottlenecks and enhance model throughput.
Course Format
- Interactive lectures and group discussions.
- Practical labs featuring real-time profiling and operator tuning.
- Optimization exercises grounded in real-world edge-case deployment scenarios.
Customization Options
- For organizations seeking tailored training for this course, please contact us to arrange specifics.
CANN SDK for Computer Vision and NLP Pipelines
14 HoursThe CANN SDK (Compute Architecture for Neural Networks) equips developers with robust deployment and optimization tools for real-time AI applications in computer vision and NLP, specifically tailored for Huawei Ascend hardware.
This instructor-led live training, available online or onsite, targets intermediate-level AI practitioners aiming to build, deploy, and optimize vision and language models via the CANN SDK for production environments.
Upon completion, participants will be capable of:
- Deploying and optimizing CV and NLP models utilizing CANN and AscendCL.
- Employing CANN utilities to convert models and integrate them into active pipelines.
- Enhancing inference performance for tasks such as detection, classification, and sentiment analysis.
- Constructing real-time CV/NLP pipelines suitable for edge or cloud-based deployment scenarios.
Course Format
- Interactive lectures accompanied by demonstrations.
- Practical labs focused on model deployment and performance profiling.
- Live pipeline design exercises using real-world CV and NLP use cases.
Customization Options
- For customized training arrangements for this course, please reach out to us.
Building Custom AI Operators with CANN TIK and TVM
14 HoursCANN TIK (Tensor Instruction Kernel) and Apache TVM provide advanced capabilities for the optimization and customization of AI model operators tailored for Huawei Ascend hardware.
This instructor-led, live training, available both online and onsite, is designed for advanced system developers aiming to build, deploy, and fine-tune custom operators for AI models. The curriculum leverages CANN’s TIK programming model alongside TVM compiler integration.
Upon completing this training, participants will be equipped to:
- Develop and test custom AI operators utilizing the TIK DSL for Ascend processors.
- Seamlessly integrate custom operators into the CANN runtime and execution graphs.
- Apply TVM for operator scheduling, automatic tuning, and performance benchmarking.
- Debug and optimize instruction-level performance across various custom computation patterns.
Course Format
- Engaging lectures combined with practical demonstrations.
- Practical coding exercises for operators using TIK and TVM pipelines.
- Testing and tuning activities performed on Ascend hardware or emulators.
Course Customization Options
- For inquiries regarding customized training for this course, please reach out to us to arrange a session.
Performance Optimization on Ascend, Biren, and Cambricon
21 HoursAscend, Biren, and Cambricon stand out as premier AI hardware solutions in China, each providing specialized acceleration and profiling capabilities tailored for large-scale AI operations.
This live, instructor-led training (available online or onsite) is designed for advanced AI infrastructure and performance engineers aiming to enhance model inference and training processes across these diverse Chinese AI chip ecosystems.
Upon completion of this program, participants will be equipped to:
- Conduct benchmarking of models on Ascend, Biren, and Cambricon environments.
- Diagnose system bottlenecks and identify memory or compute inefficiencies.
- Implement optimizations at the graph, kernel, and operator levels.
- Refine deployment pipelines to maximize throughput and minimize latency.
Course Delivery Format
- Interactive lectures combined with group discussions.
- Practical application of profiling and optimization tools across each platform.
- Guided exercises centered on real-world tuning scenarios.
Customization Options
- For tailored training aligned with your specific performance environment or model architecture, please contact us to arrange a personalized session.