# Auto-Sync Auto-sync is the AscendNPU-IR (HIVM) compiler feature that automatically inserts synchronization operations so producers and consumers of shared data or resources are correctly ordered. Goals: **correctness** (no data races or ordering bugs) and **minimal overhead** (fewest syncs needed, reuse of hardware events when safe). ## Hardware Background ### AICore Architecture ### HIVM Synchronization Operations Synchronization ops are defined in `HIVMIR/HIVMSynchronizationOps.td`. Below they are described in terms of **MLIR usage** (operands/attributes), not assembly syntax. #### Intra-Core-Sync (Normal-Sync) - **`hivm.set_flag`** Operands/attributes: `set_pipe`, `wait_pipe` and `flag_id` Executes on `set_pipe` after all previous instructions on that pipe have finished. Triggers `flag_id` on execution - **`hivm.wait_flag`** Operands/attributes: `set_pipe`, `wait_pipe` and `flag_id` Executes on `wait_pipe` Blocks all following instructions until `flag_id` is triggered - **`hivm.pipe_barrier`** Operands/attributes: `pipe` Barrier across a given pipe. Block all following instructions on `pipe` until all previous instructions finish. #### Cross-Core-Sync (Block-Sync) (Intra-Block) - **`hivm.sync_block_set`** Operands/attributes: - `tcore_type` target core type (vector/cube) - `tpipe`, `pipe` (set/wait pipes on target core) - `sync_instr_mode` (default `INTRA_BLOCK_SYNCHRONIZATION`) - `event_id` Executes on `tpipe` (set_pipe) on the `tcore_type` core after all previous instructions on the same core.pipe finish. Sets `event_id` - **`hivm.sync_block_wait`** Operands/attributes: - `tcore_type` target core type (vector/cube) - `tpipe`, `pipe` (set/wait pipes on target core) - `sync_instr_mode` (default `INTRA_BLOCK_SYNCHRONIZATION`) - `event_id` Executes on `pipe` (pipe_wait) on the `tcore_type` Block all following instructions on `pipe` on the `tcore_type` core until all previous instructions finish. ## Algorithm principles ### AutoSync solutions overview The codebase provides **two** auto-sync solutions: - **`Inject-Sync/Inject-Block-Sync`** Passes Uses multiple passes to insert needed sync operations, remove redundant ones, and allocate flag IDs/event IDs using liveliness analysis. It is the primary solution enabled by default. - **`Graph-Sync-Solver/Cross-Core-GSS`** Passes Uses graph-based algorithms to analyze the input code structure and insert needed sync operations. It remains optional and can be enabled via `-hivm-enable-graph-sync-solver=true` (or `sync_solver=True` in Triton-Ascend). ### InjectSync ![alt text](../../../../images/developer_guide/auto_sync0.png) **Purpose:** Insert core-level (intra-core) synchronization (`set_flag` / `wait_flag`) using memory-dependence analysis, sync analysis, event-id allocation, and cleanup (move/remove redundant syncs). **Source:** - Headers: `include/../InjectSync/`. - Implementation: `lib/../InjectSync/` (e.g. `InjectSync.cpp`, `MemoryDependentAnalyzer.cpp`, `SyncAnalysis.cpp`, `SyncEventIdAllocation.cpp`, `IRTranslator.cpp`, `SyncCodegen.cpp`, `MoveSyncState.cpp`, `RemoveRedundantSync.cpp`, `SyncCommon.cpp`). **Stages:** 1. **IRTranslator**: Build Sync-IR from the input function (compound elements, loops, conditions, memory ops). 2. **SyncAnalyzer**: For each pair of conflicting operations, it inserts a pair of set_flag/wait_flag operations or a barrier(pipe) operations if both operations are of same pipe. 3. **MoveSyncState**: Reposition sync ops to reduce stalls while preserving semantics. 4. **RemoveRedundantSync**: Drop redundant sync pairs. 5. **SyncEventIdAllocation**: Assign static or dynamic event IDs; reuse when safe. 6. **SyncCodegen**: Emit `hivm.set_flag` / `hivm.wait_flag` / `hivm.barrier` ### InjectBlockSync **Purpose:** Insert block-level (intra-block) (cross-core) synchronization for **MIX** kernels (cube and vector): `sync_block_set`, `sync_block_wait`. **Source:** `InjectBlockSync.cpp` `InjectBlockSync.h` **Behavior**: - Runs only on **MIX** kernels (not host, not pure AIC/AIV). - Inserts `SetFFTSBaseAddrOp` when an FFTS base addr kernel argument is present. - Three modes (controlled by options and fusion kind): - **InjectAllBlockSync** — Emit block sync before/after every `LoadOp` and every `StoreOp` (cube/vector handoff). - **InjectBlockMixSync** — Full mix: build block sync IR via `SyncBlockIRTranslator`, then run SyncAnalyzer (BLOCKSYNC mode), MoveSyncState, RemoveRedundantSync, SyncEventIdAllocation, SyncCodegen. ### GraphSyncSolver ![alt text](../../../../images/developer_guide/auto_sync1.png) **Purpose:** Alternative to the Inject-Sync solution; it uses graph-based algorithms to decide when to insert pairs of set/wait operations and assign event IDs. **Source:** - Headers: `include/../GraphSyncSolver/` - Implementation: `lib/../GraphSyncSolver/` (`GraphSyncSolver.cpp`, `SyncSolver.cpp`, `SyncSolverIR.cpp`, `SyncSolverIRTranslator.cpp`, `SyncSolverCodeGen.cpp`, `GraphSolver.cpp`, `EventIdSolver.cpp`, `Utility.cpp`, `SyncSolverTest.cpp`, `SyncSolverTester.cpp`). **Stages**: 1. **IRTranslator**: Build Sync-IR from the input function (function, scopes, loops, conditions, rw-operations). 2. **Solver**: Collect conflict pairs (producer–consumer pairs), run pair selection and ordering, optionally reuse conflict pairs to save event IDs. 3. **CodeGenerator**: Translate solver result back to MLIR: emit `hivm.set_flag` / `hivm.wait_flag` / `hivm.barrier` ### CrossCoreGSS **Purpose:** Insert block-level (intra-block) (cross-core) synchronization for **MIX** kernels (cube and vector): `sync_block_set`, `sync_block_wait`. **Source:** `CrossCoreGSS.h` `CrossCoreGSS.cpp`; reuses `IRTranslator`, `Solver`, and `CodeGenerator` from GraphSyncSolver. **How it works**: - Same as the intra-core GSS pass, but it handles cross-core memory operations. ## Interface description ### Command Line Options These are typically wired in the compiler driver (e.g. `bishengir-hivm-compile`); see `Passes.td` and tools under `bishengir/lib/Tools/` for exact mapping.
Flag Type Default Description
`--disable-auto-inject-block-sync` bool false Disable automatic block-level set/wait insertion (InjectBlockSync / CrossCoreGSS).
`--disable-hivm-auto-inject-sync` bool false Disable InjectSync (Intra-Core sync).
`--enable-hivm-inject-barrier-all-sync` bool false Make InjectSync inserts barrier(all) instructions (useful auto-sync fails)
`--enable-hivm-inject-block-all-sync` bool false Make InjectBlockSync inserts block(all) instructions (useful auto-sync fails)
`--enable-hivm-unit-flag-sync` bool false Enable unit-flag sync feature.
`--enable-hivm-graph-sync-solver` bool false Use GraphSyncSolver/CrossCoreGSS instead of InjectSync/InjectBlockSync for Intra-Core/Cross-Core auto-sync.
## Constraints and capabilities - **Hardware ordering model:** Auto-sync orders execution by inserting HIVM synchronization ops (`hivm.set_flag` / `hivm.wait_flag`, `hivm.pipe_barrier`, and (when applicable) `hivm.sync_block_set` / `hivm.sync_block_wait`). The ordering is expressed in terms of **cores** and **pipes**, plus event/flag ids. - **Correctness via feasibility checking:** For the solver-based flow (Graph Sync Solver), candidate sync constraints are accepted only if they remain feasible under a graph-based reachability/ordering model (avoids deadlock or over-constraining schedules). - **Kernel coverage (block-level sync):** Block-level cross-core sync (`sync_block_set` / `sync_block_wait`) is intended for **MIX** kernels (cube/vector handoff). InjectBlockSync/CrossCoreGSS will not apply to non-MIX flows (host or pure AIC/AIV). - **Optional feature modes:** Unit-flag sync can be enabled as an alternative pattern for supported operations, and the graph-based solver can be selected instead of InjectSync/InjectBlockSync via compiler options. - **Verification requirements:** Check that emitted ops satisfy dialect verification; `set_flag` / `wait_flag` must share the same event/flag id and compatible core/pipe endpoints.