Auto-Sync¶
Auto-sync is the AscendNPU-IR (HIVM) compiler feature that automatically inserts synchronization operations so producers and consumers of shared data or resources are correctly ordered. Goals: correctness (no data races or ordering bugs) and minimal overhead (fewest syncs needed, reuse of hardware events when safe).
Hardware Background¶
AICore Architecture¶
HIVM Synchronization Operations¶
Synchronization ops are defined in HIVMIR/HIVMSynchronizationOps.td. Below they are described in terms of MLIR usage (operands/attributes), not assembly syntax.
Intra-Core-Sync (Normal-Sync)¶
hivm.set_flag
Operands/attributes:set_pipe,wait_pipeandflag_id
Executes onset_pipeafter all previous instructions on that pipe have finished. Triggersflag_idon executionhivm.wait_flag
Operands/attributes:set_pipe,wait_pipeandflag_id
Executes onwait_pipe
Blocks all following instructions untilflag_idis triggeredhivm.pipe_barrier
Operands/attributes:pipe
Barrier across a given pipe.
Block all following instructions onpipeuntil all previous instructions finish.
Cross-Core-Sync (Block-Sync) (Intra-Block)¶
hivm.sync_block_set
Operands/attributes:tcore_typetarget core type (vector/cube)tpipe,pipe(set/wait pipes on target core)sync_instr_mode(defaultINTRA_BLOCK_SYNCHRONIZATION)event_id
Executes on
tpipe(set_pipe) on thetcore_typecore after all previous instructions on the same core.pipe finish. Setsevent_idhivm.sync_block_wait
Operands/attributes:tcore_typetarget core type (vector/cube)tpipe,pipe(set/wait pipes on target core)sync_instr_mode(defaultINTRA_BLOCK_SYNCHRONIZATION)event_id
Executes on
pipe(pipe_wait) on thetcore_type
Block all following instructions onpipeon thetcore_typecore until all previous instructions finish.
Algorithm principles¶
AutoSync solutions overview¶
The codebase provides two auto-sync solutions:
Inject-Sync/Inject-Block-SyncPassesUses multiple passes to insert needed sync operations, remove redundant ones, and allocate flag IDs/event IDs using liveliness analysis. It is the primary solution enabled by default.
Graph-Sync-Solver/Cross-Core-GSSPassesUses graph-based algorithms to analyze the input code structure and insert needed sync operations. It remains optional and can be enabled via
-hivm-enable-graph-sync-solver=true(orsync_solver=Truein Triton-Ascend).
InjectSync¶

Purpose: Insert core-level (intra-core) synchronization (set_flag / wait_flag) using memory-dependence analysis, sync analysis, event-id allocation, and cleanup (move/remove redundant syncs).
Source:
Headers:
include/../InjectSync/.Implementation:
lib/../InjectSync/(e.g.InjectSync.cpp,MemoryDependentAnalyzer.cpp,SyncAnalysis.cpp,SyncEventIdAllocation.cpp,IRTranslator.cpp,SyncCodegen.cpp,MoveSyncState.cpp,RemoveRedundantSync.cpp,SyncCommon.cpp).
Stages:
IRTranslator:
Build Sync-IR from the input function (compound elements, loops, conditions, memory ops).SyncAnalyzer:
For each pair of conflicting operations, it inserts a pair of set_flag/wait_flag operations or a barrier(pipe) operations if both operations are of same pipe.MoveSyncState:
Reposition sync ops to reduce stalls while preserving semantics.RemoveRedundantSync:
Drop redundant sync pairs.SyncEventIdAllocation:
Assign static or dynamic event IDs; reuse when safe.SyncCodegen:
Emithivm.set_flag/hivm.wait_flag/hivm.barrier
InjectBlockSync¶
Purpose: Insert block-level (intra-block) (cross-core) synchronization for MIX kernels (cube and vector): sync_block_set, sync_block_wait.
Source: InjectBlockSync.cpp InjectBlockSync.h
Behavior:
Runs only on MIX kernels (not host, not pure AIC/AIV).
Inserts
SetFFTSBaseAddrOpwhen an FFTS base addr kernel argument is present.Three modes (controlled by options and fusion kind):
InjectAllBlockSync — Emit block sync before/after every
LoadOpand everyStoreOp(cube/vector handoff).InjectBlockMixSync — Full mix: build block sync IR via
SyncBlockIRTranslator, then run SyncAnalyzer (BLOCKSYNC mode), MoveSyncState, RemoveRedundantSync, SyncEventIdAllocation, SyncCodegen.
GraphSyncSolver¶

Purpose: Alternative to the Inject-Sync solution; it uses graph-based algorithms to decide when to insert pairs of set/wait operations and assign event IDs.
Source:
Headers:
include/../GraphSyncSolver/Implementation:
lib/../GraphSyncSolver/(GraphSyncSolver.cpp,SyncSolver.cpp,SyncSolverIR.cpp,SyncSolverIRTranslator.cpp,SyncSolverCodeGen.cpp,GraphSolver.cpp,EventIdSolver.cpp,Utility.cpp,SyncSolverTest.cpp,SyncSolverTester.cpp).
Stages:
IRTranslator:
Build Sync-IR from the input function (function, scopes, loops, conditions, rw-operations).Solver:
Collect conflict pairs (producer–consumer pairs), run pair selection and ordering, optionally reuse conflict pairs to save event IDs.CodeGenerator:
Translate solver result back to MLIR: emithivm.set_flag/hivm.wait_flag/hivm.barrier
CrossCoreGSS¶
Purpose: Insert block-level (intra-block) (cross-core) synchronization for MIX kernels (cube and vector): sync_block_set, sync_block_wait.
Source: CrossCoreGSS.h CrossCoreGSS.cpp; reuses IRTranslator, Solver, and CodeGenerator from GraphSyncSolver.
How it works:
Same as the intra-core GSS pass, but it handles cross-core memory operations.
Interface description¶
Command Line Options¶
These are typically wired in the compiler driver (e.g. bishengir-hivm-compile); see Passes.td and tools under bishengir/lib/Tools/ for exact mapping.
| Flag | Type | Default | Description |
|---|---|---|---|
| `--disable-auto-inject-block-sync` | bool | false | Disable automatic block-level set/wait insertion (InjectBlockSync / CrossCoreGSS). |
| `--disable-hivm-auto-inject-sync` | bool | false | Disable InjectSync (Intra-Core sync). |
| `--enable-hivm-inject-barrier-all-sync` | bool | false | Make InjectSync inserts barrier(all) instructions (useful auto-sync fails) |
| `--enable-hivm-inject-block-all-sync` | bool | false | Make InjectBlockSync inserts block(all) instructions (useful auto-sync fails) |
| `--enable-hivm-unit-flag-sync` | bool | false | Enable unit-flag sync feature. |
| `--enable-hivm-graph-sync-solver` | bool | false | Use GraphSyncSolver/CrossCoreGSS instead of InjectSync/InjectBlockSync for Intra-Core/Cross-Core auto-sync. |
Constraints and capabilities¶
Hardware ordering model: Auto-sync orders execution by inserting HIVM synchronization ops (
hivm.set_flag/hivm.wait_flag,hivm.pipe_barrier, and (when applicable)hivm.sync_block_set/hivm.sync_block_wait). The ordering is expressed in terms of cores and pipes, plus event/flag ids.Correctness via feasibility checking: For the solver-based flow (Graph Sync Solver), candidate sync constraints are accepted only if they remain feasible under a graph-based reachability/ordering model (avoids deadlock or over-constraining schedules).
Kernel coverage (block-level sync): Block-level cross-core sync (
sync_block_set/sync_block_wait) is intended for MIX kernels (cube/vector handoff). InjectBlockSync/CrossCoreGSS will not apply to non-MIX flows (host or pure AIC/AIV).Optional feature modes: Unit-flag sync can be enabled as an alternative pattern for supported operations, and the graph-based solver can be selected instead of InjectSync/InjectBlockSync via compiler options.
Verification requirements: Check that emitted ops satisfy dialect verification;
set_flag/wait_flagmust share the same event/flag id and compatible core/pipe endpoints.