IR Interface Overview

Multi-level IR Abstraction

  • High-level abstract interfaces that hide low-level details and map hardware-agnostic expressions to low-level instructions, improving operator development usability

  • Fine-grained performance control interfaces for precise control of on-chip memory addresses, pipeline sync insertion, and ping-pong pipeline optimization

  • Multi-level interfaces support flexible integration of custom DSLs and frameworks for high-performance custom operators on Ascend NPU

  Torch-MLIR / Triton       (Framework/DSL layer)
         |
         v
  Linalg / Tensor            (General tensor algebra layer)
         |
         v
  HFusion                    (Hardware-aware fusion & scheduling layer)
         |
         v
  HIVM                       (NPU instruction layer)
         |
         v
  LIR -> Binary              (Machine code generation)
  • Linalg / Tensor layer: Standard MLIR dialects for operator semantics (Elemwise, Broadcast, Reduce, Transpose, Concat, etc.); HFusion performs fusion, tiling, and scheduling automatically

  • HFusion layer: Ascend-NPU-aware named ops (e.g. hfusion.elemwise_unary, hfusion.cast, hfusion.select, hfusion.reduce_with_index), tensor semantics, automatic bufferization, tiling, and scheduling

  • HIVM layer: Direct mapping to NPU instructions, explicit control of memory hierarchy (GM/UB/L1/L0), compute pipelines (Vector/Cube/MTE), and sync primitives for fine-grained tuning

These layers allow custom DSLs and frameworks to integrate. Triton and PyTorch connect via IR conversion for high-performance custom operators on Ascend NPU.

Compile Options and Function Attributes

Compile options

bishengir-compile provides these common options:

Option

Default

Description

-target

Ascend<Name>

Target device (core count, on-chip memory size and other hardware specifications), queried via npu-smi info.

-block-dim

1

Number of blocks; compiled kernel carries hacc.block_dim

-enable-hfusion-compile

false

Enable HFusion pipeline (fusion, scheduling, tiling)

-enable-hivm-compile

true

Enable HIVM pipeline (lower to HIVM and optimize)

-enable-torch-compile

false

Enable Torch-MLIR pipeline

-enable-triton-kernel-compile

false

Enable Triton kernel pipeline

Supported target devices include the Atlas A2/A3, Ascend 950PR/Ascend 950DT series.

Function attributes

The following attributes mark kernel entry functions and are shared across all integration paths:

Attribute

Description

hacc.entry

Marks the current function as the kernel entry

hacc.function_kind = #hacc.function_kind<DEVICE>

Function runs on the DEVICE side

hacc.function_kind = #hacc.function_kind<HOST>

Function runs on the HOST side; HFusion will outline device kernels

Example:

func.func @kernel(...) attributes {hacc.entry, hacc.function_kind = #hacc.function_kind<DEVICE>} {
  ...
}

Triton integration

Triton is a widely used language for high-performance kernel development. Triton Ascend converts Triton kernels to MLIR for the AscendNPU IR stack and enables running Triton kernels on Ascend NPU. See Triton interface for installation, environment, op mapping, and Ascend extensions.

TileLang integration

TileLang (tilelang-ascend) is a domain-specific language for Ascend NPU kernel development, built on tile-lang’s Pythonic syntax and TVM. It supports GEMM, vector operations, and attention mechanisms, compiling kernels to AscendNPU IR (HIVM) for execution on Ascend NPU. See TileLang interface for installation and usage details.

Framework integration

AscendNPU IR supports framework integration (PyTorch/TensorFlow/MindSpore) in two ways:

  • DSL integration: e.g., Triton and TileLang

  • IR integration: e.g., Torch IR, Linalg/HFusion IR, and HIVM IR

See Framework interface for details.