Auto-Sync

Auto-sync is the AscendNPU-IR (HIVM) compiler feature that automatically inserts synchronization operations so producers and consumers of shared data or resources are correctly ordered. Goals: correctness (no data races or ordering bugs) and minimal overhead (fewest syncs needed, reuse of hardware events when safe).

Hardware Background

AICore Architecture

https://www.hiascend.com/document/detail/zh/CANNCommunityEdition/83RC1/opdevg/Ascendcopdevg/atlas_ascendc_10_0008.html

HIVM Synchronization Operations

Synchronization ops are defined in HIVMIR/HIVMSynchronizationOps.td. Below they are described in terms of MLIR usage (operands/attributes), not assembly syntax.

Intra-Core-Sync (Normal-Sync)

  • hivm.set_flag
    Operands/attributes: set_pipe, wait_pipe and flag_id
    Executes on set_pipe after all previous instructions on that pipe have finished. Triggers flag_id on execution

  • hivm.wait_flag
    Operands/attributes: set_pipe, wait_pipe and flag_id
    Executes on wait_pipe
    Blocks all following instructions until flag_id is triggered

  • hivm.pipe_barrier
    Operands/attributes: pipe
    Barrier across a given pipe.
    Block all following instructions on pipe until all previous instructions finish.

Cross-Core-Sync (Block-Sync) (Intra-Block)

  • hivm.sync_block_set
    Operands/attributes:

    • tcore_type target core type (vector/cube)

    • tpipe, pipe (set/wait pipes on target core)

    • sync_instr_mode (default INTRA_BLOCK_SYNCHRONIZATION)

    • event_id

    Executes on tpipe (set_pipe) on the tcore_type core after all previous instructions on the same core.pipe finish. Sets event_id

  • hivm.sync_block_wait
    Operands/attributes:

    • tcore_type target core type (vector/cube)

    • tpipe, pipe (set/wait pipes on target core)

    • sync_instr_mode (default INTRA_BLOCK_SYNCHRONIZATION)

    • event_id

    Executes on pipe (pipe_wait) on the tcore_type
    Block all following instructions on pipe on the tcore_type core until all previous instructions finish.

Algorithm principles

AutoSync solutions overview

The codebase provides two auto-sync solutions:

  • Inject-Sync/Inject-Block-Sync Passes

    Uses multiple passes to insert needed sync operations, remove redundant ones, and allocate flag IDs/event IDs using liveliness analysis. It is the primary solution enabled by default.

  • Graph-Sync-Solver/Cross-Core-GSS Passes

    Uses graph-based algorithms to analyze the input code structure and insert needed sync operations. It remains optional and can be enabled via -hivm-enable-graph-sync-solver=true (or sync_solver=True in Triton-Ascend).

InjectSync

alt text

Purpose: Insert core-level (intra-core) synchronization (set_flag / wait_flag) using memory-dependence analysis, sync analysis, event-id allocation, and cleanup (move/remove redundant syncs).

Source:

  • Headers: include/../InjectSync/.

  • Implementation: lib/../InjectSync/ (e.g. InjectSync.cpp, MemoryDependentAnalyzer.cpp, SyncAnalysis.cpp, SyncEventIdAllocation.cpp, IRTranslator.cpp, SyncCodegen.cpp, MoveSyncState.cpp, RemoveRedundantSync.cpp, SyncCommon.cpp).

Stages:

  1. IRTranslator:
    Build Sync-IR from the input function (compound elements, loops, conditions, memory ops).

  2. SyncAnalyzer:
    For each pair of conflicting operations, it inserts a pair of set_flag/wait_flag operations or a barrier(pipe) operations if both operations are of same pipe.

  3. MoveSyncState:
    Reposition sync ops to reduce stalls while preserving semantics.

  4. RemoveRedundantSync:
    Drop redundant sync pairs.

  5. SyncEventIdAllocation:
    Assign static or dynamic event IDs; reuse when safe.

  6. SyncCodegen:
    Emit hivm.set_flag / hivm.wait_flag / hivm.barrier

InjectBlockSync

Purpose: Insert block-level (intra-block) (cross-core) synchronization for MIX kernels (cube and vector): sync_block_set, sync_block_wait.

Source: InjectBlockSync.cpp InjectBlockSync.h

Behavior:

  • Runs only on MIX kernels (not host, not pure AIC/AIV).

  • Inserts SetFFTSBaseAddrOp when an FFTS base addr kernel argument is present.

  • Three modes (controlled by options and fusion kind):

    • InjectAllBlockSync — Emit block sync before/after every LoadOp and every StoreOp (cube/vector handoff).

    • InjectBlockMixSync — Full mix: build block sync IR via SyncBlockIRTranslator, then run SyncAnalyzer (BLOCKSYNC mode), MoveSyncState, RemoveRedundantSync, SyncEventIdAllocation, SyncCodegen.

GraphSyncSolver

alt text

Purpose: Alternative to the Inject-Sync solution; it uses graph-based algorithms to decide when to insert pairs of set/wait operations and assign event IDs.

Source:

  • Headers: include/../GraphSyncSolver/

  • Implementation: lib/../GraphSyncSolver/ (GraphSyncSolver.cpp, SyncSolver.cpp, SyncSolverIR.cpp, SyncSolverIRTranslator.cpp, SyncSolverCodeGen.cpp, GraphSolver.cpp, EventIdSolver.cpp, Utility.cpp, SyncSolverTest.cpp, SyncSolverTester.cpp).

Stages:

  1. IRTranslator:
    Build Sync-IR from the input function (function, scopes, loops, conditions, rw-operations).

  2. Solver:
    Collect conflict pairs (producer–consumer pairs), run pair selection and ordering, optionally reuse conflict pairs to save event IDs.

  3. CodeGenerator:
    Translate solver result back to MLIR: emit hivm.set_flag / hivm.wait_flag / hivm.barrier

CrossCoreGSS

Purpose: Insert block-level (intra-block) (cross-core) synchronization for MIX kernels (cube and vector): sync_block_set, sync_block_wait.

Source: CrossCoreGSS.h CrossCoreGSS.cpp; reuses IRTranslator, Solver, and CodeGenerator from GraphSyncSolver.

How it works:

  • Same as the intra-core GSS pass, but it handles cross-core memory operations.

Interface description

Command Line Options

These are typically wired in the compiler driver (e.g. bishengir-hivm-compile); see Passes.td and tools under bishengir/lib/Tools/ for exact mapping.

Flag Type Default Description
`--disable-auto-inject-block-sync` bool false Disable automatic block-level set/wait insertion (InjectBlockSync / CrossCoreGSS).
`--disable-hivm-auto-inject-sync` bool false Disable InjectSync (Intra-Core sync).
`--enable-hivm-inject-barrier-all-sync` bool false Make InjectSync inserts barrier(all) instructions (useful auto-sync fails)
`--enable-hivm-inject-block-all-sync` bool false Make InjectBlockSync inserts block(all) instructions (useful auto-sync fails)
`--enable-hivm-unit-flag-sync` bool false Enable unit-flag sync feature.
`--enable-hivm-graph-sync-solver` bool false Use GraphSyncSolver/CrossCoreGSS instead of InjectSync/InjectBlockSync for Intra-Core/Cross-Core auto-sync.

Constraints and capabilities

  • Hardware ordering model: Auto-sync orders execution by inserting HIVM synchronization ops (hivm.set_flag / hivm.wait_flag, hivm.pipe_barrier, and (when applicable) hivm.sync_block_set / hivm.sync_block_wait). The ordering is expressed in terms of cores and pipes, plus event/flag ids.

  • Correctness via feasibility checking: For the solver-based flow (Graph Sync Solver), candidate sync constraints are accepted only if they remain feasible under a graph-based reachability/ordering model (avoids deadlock or over-constraining schedules).

  • Kernel coverage (block-level sync): Block-level cross-core sync (sync_block_set / sync_block_wait) is intended for MIX kernels (cube/vector handoff). InjectBlockSync/CrossCoreGSS will not apply to non-MIX flows (host or pure AIC/AIV).

  • Optional feature modes: Unit-flag sync can be enabled as an alternative pattern for supported operations, and the graph-based solver can be selected instead of InjectSync/InjectBlockSync via compiler options.

  • Verification requirements: Check that emitted ops satisfy dialect verification; set_flag / wait_flag must share the same event/flag id and compatible core/pipe endpoints.