Auto-Sync¶

Auto-sync is the AscendNPU-IR (HIVM) compiler feature that automatically inserts synchronization operations so producers and consumers of shared data or resources are correctly ordered. Goals: correctness (no data races or ordering bugs) and minimal overhead (fewest syncs needed, reuse of hardware events when safe).

Hardware Background¶

AICore Architecture¶

https://www.hiascend.com/document/detail/zh/CANNCommunityEdition/83RC1/opdevg/Ascendcopdevg/atlas_ascendc_10_0008.html

HIVM Synchronization Operations¶

Synchronization ops are defined in HIVMIR/HIVMSynchronizationOps.td. Below they are described in terms of MLIR usage (operands/attributes), not assembly syntax.

Intra-Core-Sync (Normal-Sync)¶

hivm.set_flag
Operands/attributes: set_pipe, wait_pipe and flag_id
Executes on set_pipe after all previous instructions on that pipe have finished. Triggers flag_id on execution
hivm.wait_flag
Operands/attributes: set_pipe, wait_pipe and flag_id
Executes on wait_pipe
Blocks all following instructions until flag_id is triggered
hivm.pipe_barrier
Operands/attributes: pipe
Barrier across a given pipe.
Block all following instructions on pipe until all previous instructions finish.

Cross-Core-Sync (Block-Sync) (Intra-Block)¶

hivm.sync_block_set
Operands/attributes:
- tcore_type target core type (vector/cube)
- tpipe, pipe (set/wait pipes on target core)
- sync_instr_mode (default INTRA_BLOCK_SYNCHRONIZATION)
- event_id
Executes on tpipe (set_pipe) on the tcore_type core after all previous instructions on the same core.pipe finish. Sets event_id
hivm.sync_block_wait
Operands/attributes:
- tcore_type target core type (vector/cube)
- tpipe, pipe (set/wait pipes on target core)
- sync_instr_mode (default INTRA_BLOCK_SYNCHRONIZATION)
- event_id
Executes on pipe (pipe_wait) on the tcore_type
Block all following instructions on pipe on the tcore_type core until all previous instructions finish.

Algorithm principles¶

AutoSync solutions overview¶

The codebase provides two auto-sync solutions:

Inject-Sync/Inject-Block-Sync Passes

Uses multiple passes to insert needed sync operations, remove redundant ones, and allocate flag IDs/event IDs using liveliness analysis. It is the primary solution enabled by default.
Graph-Sync-Solver/Cross-Core-GSS Passes

Uses graph-based algorithms to analyze the input code structure and insert needed sync operations. It remains optional and can be enabled via -hivm-enable-graph-sync-solver=true (or sync_solver=True in Triton-Ascend).

InjectSync¶

alt text

Purpose: Insert core-level (intra-core) synchronization (set_flag / wait_flag) using memory-dependence analysis, sync analysis, event-id allocation, and cleanup (move/remove redundant syncs).

Source:

Headers: include/../InjectSync/.
Implementation: lib/../InjectSync/ (e.g. InjectSync.cpp, MemoryDependentAnalyzer.cpp, SyncAnalysis.cpp, SyncEventIdAllocation.cpp, IRTranslator.cpp, SyncCodegen.cpp, MoveSyncState.cpp, RemoveRedundantSync.cpp, SyncCommon.cpp).

Stages:

IRTranslator:
Build Sync-IR from the input function (compound elements, loops, conditions, memory ops).
SyncAnalyzer:
For each pair of conflicting operations, it inserts a pair of set_flag/wait_flag operations or a barrier(pipe) operations if both operations are of same pipe.
MoveSyncState:
Reposition sync ops to reduce stalls while preserving semantics.
RemoveRedundantSync:
Drop redundant sync pairs.
SyncEventIdAllocation:
Assign static or dynamic event IDs; reuse when safe.
SyncCodegen:
Emit hivm.set_flag / hivm.wait_flag / hivm.barrier

InjectBlockSync¶

Purpose: Insert block-level (intra-block) (cross-core) synchronization for MIX kernels (cube and vector): sync_block_set, sync_block_wait.

Source: InjectBlockSync.cpp InjectBlockSync.h

Behavior:

Runs only on MIX kernels (not host, not pure AIC/AIV).
Inserts SetFFTSBaseAddrOp when an FFTS base addr kernel argument is present.
Three modes (controlled by options and fusion kind):
- InjectAllBlockSync — Emit block sync before/after every LoadOp and every StoreOp (cube/vector handoff).
- InjectBlockMixSync — Full mix: build block sync IR via SyncBlockIRTranslator, then run SyncAnalyzer (BLOCKSYNC mode), MoveSyncState, RemoveRedundantSync, SyncEventIdAllocation, SyncCodegen.

GraphSyncSolver¶

alt text

Purpose: Alternative to the Inject-Sync solution; it uses graph-based algorithms to decide when to insert pairs of set/wait operations and assign event IDs.

Source:

Headers: include/../GraphSyncSolver/
Implementation: lib/../GraphSyncSolver/ (GraphSyncSolver.cpp, SyncSolver.cpp, SyncSolverIR.cpp, SyncSolverIRTranslator.cpp, SyncSolverCodeGen.cpp, GraphSolver.cpp, EventIdSolver.cpp, Utility.cpp, SyncSolverTest.cpp, SyncSolverTester.cpp).

Stages:

IRTranslator:
Build Sync-IR from the input function (function, scopes, loops, conditions, rw-operations).
Solver:
Collect conflict pairs (producer–consumer pairs), run pair selection and ordering, optionally reuse conflict pairs to save event IDs.
CodeGenerator:
Translate solver result back to MLIR: emit hivm.set_flag / hivm.wait_flag / hivm.barrier

CrossCoreGSS¶

Purpose: Insert block-level (intra-block) (cross-core) synchronization for MIX kernels (cube and vector): sync_block_set, sync_block_wait.

Source: CrossCoreGSS.h CrossCoreGSS.cpp; reuses IRTranslator, Solver, and CodeGenerator from GraphSyncSolver.

How it works:

Same as the intra-core GSS pass, but it handles cross-core memory operations.

Interface description¶

Command Line Options¶

These are typically wired in the compiler driver (e.g. bishengir-hivm-compile); see Passes.td and tools under bishengir/lib/Tools/ for exact mapping.

Flag	Type	Default	Description
`--disable-auto-inject-block-sync`	bool	false	Disable automatic block-level set/wait insertion (InjectBlockSync / CrossCoreGSS).
`--disable-hivm-auto-inject-sync`	bool	false	Disable InjectSync (Intra-Core sync).
`--enable-hivm-inject-barrier-all-sync`	bool	false	Make InjectSync inserts barrier(all) instructions (useful auto-sync fails)
`--enable-hivm-inject-block-all-sync`	bool	false	Make InjectBlockSync inserts block(all) instructions (useful auto-sync fails)
`--enable-hivm-unit-flag-sync`	bool	false	Enable unit-flag sync feature.
`--enable-hivm-graph-sync-solver`	bool	false	Use GraphSyncSolver/CrossCoreGSS instead of InjectSync/InjectBlockSync for Intra-Core/Cross-Core auto-sync.

Constraints and capabilities¶

Hardware ordering model: Auto-sync orders execution by inserting HIVM synchronization ops (hivm.set_flag / hivm.wait_flag, hivm.pipe_barrier, and (when applicable) hivm.sync_block_set / hivm.sync_block_wait). The ordering is expressed in terms of cores and pipes, plus event/flag ids.
Correctness via feasibility checking: For the solver-based flow (Graph Sync Solver), candidate sync constraints are accepted only if they remain feasible under a graph-based reachability/ordering model (avoids deadlock or over-constraining schedules).
Kernel coverage (block-level sync): Block-level cross-core sync (sync_block_set / sync_block_wait) is intended for MIX kernels (cube/vector handoff). InjectBlockSync/CrossCoreGSS will not apply to non-MIX flows (host or pure AIC/AIV).
Optional feature modes: Unit-flag sync can be enabled as an alternative pattern for supported operations, and the graph-based solver can be selected instead of InjectSync/InjectBlockSync via compiler options.
Verification requirements: Check that emitted ops satisfy dialect verification; set_flag / wait_flag must share the same event/flag id and compatible core/pipe endpoints.