# 'hivm' Dialect Passes ## `-auto-blockify-parallel-loop` _Enable auto loop on blocks when logical blocknum is larger than physical one_ ## `-compose-collapse-expand` _Compose collapse and expand op_ ## `-convert-non-contiguous-reshape-to-copy` _Generate copy for reassociative reshape that might be non-contiguous_ ## `-convert-to-hivm-op` _Convert Ops from other dialects to HIVM Ops_ ## `-cv-pipelining` _Cube and vector core pipelining for multi-buffer'ed mix-cv ops_ ### Options ```text -enable-auto-balance : Enable balancing of vector subtasks during pipelining. ``` ## `-hivm-add-ffts-to-syncblocksetop` _Add FFTS (arg0) to SyncBlockSetOp_ This pass adds FFTS (arg0) to SyncBlockSetOp. ## `-hivm-aggregated-decompose-op` _Decompose hivm ops that use hivm AggregatedOpInterface_ ### Options ```text -decompose-phase : Specify which decompose phase to apply. ``` ## `-hivm-align-alloc-size` _Automatically align memref.alloc size for special hivm op that has to access aligned size_ For some hivm ops, its access size can only be aligned to hw unit size, so this pass need adjust the memref.alloc size for the case to avoid access out of bounds. ## `-hivm-alloc-extra-buffer` _Allocate additional temporary buffer for op needed_ ## `-hivm-auto-infer-buffer-size` _Auto infer buffer size_ infer buffer size by inserting annotation.mark Op### `-hivm-bind-sub-block` _Tile and bind sub block_ ## `-hivm-bind-sync-block-lock-arg` _Bind func augument with hacc.syncblocklock to CreateSyncBlockLockOp_ ## `-hivm-bind-workspace-arg` _Bind func augument with hacc.workspace to AllocWorkspaceOp_ ## `-hivm-bubble-up-extract-slice` _Tile and bind sub block_ ## `-hivm-clone-tensor-empty` _Output clones to different empty tensors based on hivmOp._ This pass clone different tensor.empty to hivmOp output### `-hivm-constantize-buffer-size` _Try to constantize dynamic shape buffers._ This pass tries to constantize dynamic shape buffers by upper-bounding their original shape. If successful, a new, static shaped alloc will be created and subviewed to the original shape for further use. ## `-hivm-decompose-op` _Decompose compound hivm op to multiple hivm ops according to hardware ability._ This pass decomposes compound hivm op to multiple hivm ops according to hardware ability. For example, hardware cannot cast f32 to i8 type directly, therefore it needs to be composed to the combination of f32 to f16 cast op and f16 to i8 cast op. In dynamic cases, create annotation.markOp with attr buffer_size_in_byte for allocated extra buffer, the value of buffer_size_in_byte is same as the src or dst operand of original op. ## `-hivm-enable-multi-buffer` _Enable multi buffer_ This pass enable multi buffer for hivm op if the op is marked "hivm.multi_buffer". ## `-hivm-enable-stride-align` _Align memref allocations according to stride align marks_ Re-allocate memrefs according to anntations of storage_align marks ## `-hivm-flatten-ops` _Flatten HIVM ops._ ## `-hivm-graph-sync-solver` _Graph sync solver_ ### Options ```text -enable-unit-flag : Enable unit-flag modes for synchronization ``` ## `-hivm-infer-data-layout` _Infer data layout for HIVM Ops_ ## `-hivm-infer-func-core-type` _Infer the core type of each function_ ## `-hivm-infer-mem-scope` _Infer memory scope for HIVM Ops_ ## `-hivm-init-entry-kernel` _Insert set_mask_norm() at the beginning of entry kernel_ ## `-hivm-inject-block-sync` _Auto inject block sync_ ### Options ```text -block-all-sync : Enable inject all block sync for HIVM injectBlockSync. -assume-alive-loops : Assume that all loops (forOp whileOp) will execute at least once. -disable-auto-inject-block-sync : Toggle auto set/wait insertion, always keep SetFFTSBaseAddrOp ``` ## `-hivm-inject-sync` _Auto inject sync_ ### Options ```text -sync-mode : inject sync mode (default is inject normal) -enable-unit-flag : Enable unit-flag modes for synchronization -assume-alive-loops : Assume that all loops (forOp whileOp) will execute at least once. ``` ## `-hivm-inline-fixpipe` _Convert ops to HIVM Fixpipe op_ ## `-hivm-inline-load-copy` _Inline Copied load_ ## `-hivm-inline-otf-broadcast` _Inline OTF broadcast_ ## `-hivm-inline-otf-load-store` _On the fly Inline Load and Store operations_ ## `-hivm-insert-infer-sync-block-lock-num-and-init-func` _Insert infer-sync-block-lock callback func for host_ Calculate total static sync block lock num and initand then create host callback to return this size### `-hivm-insert-infer-task-type-func` _Infer the module's task type and emit a host function that returns it._ Detect whether the module is CubeVectorMix, CubeOnly, VectorOnly or Unknown, then emit a host‑side function `_infer_task_type_function` that returns an i8 constant encoding the detected type and is marked with the appropriate HACC host‑function attributes. ## `-hivm-insert-infer-workspace-size-func` _Insert infer-workspace callback func for host_ Calculate total static workspace size after plan-workspace pass and then create host callback to return this size### `-hivm-insert-init-and-finish-for-debug` _Insert init and finish for debug_ ## `-hivm-insert-load-store-for-mix-cv` _Insert load store op for mix cv_ ## `-hivm-insert-nz2nd-for-debug` _Insert nz2nd for debug_ ## `-hivm-lift-lowest-stride` _Lift lowest stride of operands of hivm ops_ For most hivm structured op, lift the lowest stride of operands, if the last dim is not contiguous. Exceptions: MacroOp and VArangeOp. For example, the type of operand is memref<16xf16, strided<[8]>>, after LiftLowestStride, the type would be memref<16x1xf32, strided<[8, 1]>> with contiguous last dim. ## `-hivm-lift-zero-rank` ## `-hivm-lower-create-sync-block-lock` _Lower CreateSyncBlockLockOp to ViewOp_ ## `-hivm-lower-to-loops` _Lower hivm ops to loops_ ## `-hivm-map-forall-to-blocks` _Map forall to hivm blocks._ This pass maps each scf.forall operations to HIVM block ops. Mapping is one-to-one and the induction variables of scf. forall are rewritten to hivm block idx ops. ## `-hivm-mark-disable-load` _Mark the memref.loads that need to disable dcache_ ## `-hivm-mark-multi-buffer` _Mark multi buffer for HIVM Ops_ This pass mark multi buffer for hivm ops if the option enable-auto is true. Note that Buffer with scope L0C would not be marked. If enable-auto is false, do nothing. ### Options ```text -enable-auto : Mark multi buffer automatically. -limit-auto-multi-buffer-only-for-local-buffer : Disable multi-buffer mark on workspace -limit-auto-multi-buffer-of-local-buffer : Limit local buffer auto multi buffer -limit-mix-auto-multi-buffer-buffer : Disable multi-buffer-buffer on cube, vector Or no limit -set-workspace-multibuffer : Override for multibuffer number for workspace ``` ## `-hivm-mark-real-core-type` _Mark scalar operations with core-type attribute._ ### Options ```text -remove-core-type-attrs : Remove all core type attributes. If set to true, this pass becomes a cleanup pass. ``` ## `-hivm-mark-stride-align` _Automatically annotate stride_align marks for operands of hivm ops_ For all hivm ops, annotate their memref operands with storage_align marks automatically ## `-hivm-memref-alloc-to-alloca` _Convert local AllocOp to AllocaOp_ This pass replace all memref.alloc with non - global memory space to memref.alloca.### `-hivm-normalize-loop-iterator` _Normalize special state of loop iterator before plan-memory_ ## `-hivm-normalize-matmul` _Normalize hivm matmul op_ ## `-hivm-opt-func-output` _Try to optimize function output after bufferization._ Try to remove unnecessary address return.### `-hivm-opt-single-point` _Optimize single point hivm op by scalar operation._ This pass optimize the single point hivm op by scalar operation. ## `-hivm-plan-memory` _Plan memory for HIVM Ops_ ### Options ```text -mem-plan-mode : plan mem mode (default is LOCAL_MEM_PLAN) -enable-global-workspace-reuse : Enable global workspace reuse ,default : false -restrict-inplace-as-isa : restrict memory inplace as isa, default : false ``` ## `-hivm-recognize-deinterleave-op` _Optimize discontinuous access to deinterleave._ This pass optimize discontinuous memory access using deinterleave. ## `-hivm-reduce-rank-subview` _Reduce rank using subview_ ## `-hivm-set-buffer-size` ## `-hivm-split-mix-kernel` _Split Mix device functions into AICube and AIVector functions._ Split mix kernels into separate AICube and AIVector kernels, and mark the parent module as a Mix module. Note: * If a Mix kernel is called within a Host function, a function declaration is generated for the final kernel launch. Currently don't support calling Mix kernel within a device function. Input: ```mlir func (workspace) attribute {tcore_type = #hivm.tcore_type} { t = cube_op ins() outs(workspace) ... = vector_op ins(t) ... } ``` Output : ```mlir func (workspace) attribute {tcore_type = #hivm.tcore_type} { t = cube_op ins() outs(workspace) annotation.mark t // mark to avoid dce } func (workspace) attribute {tcore_type = #hivm.tcore_type} { ... = vector_op ins(workspace) ... } ``` ## `-hivm-sync-block-hoisting` _Hoist syncblock lock and unlock operation to the parent region if it is in the scf.for or scf.while_ ## `-hivm-tile-batchmm-into-loop` _Tile batch matmul into loop with iteration on batch dimension_ ## `-insert-workspace-for-mix-cv` _Insert workspace for mix cv_ ## `-tile-cube-vector-loop` _Tile cube and vector loops on local buffer_ This pass will attempt to tile cube and vector ops again on the local buffer because: 1. we can reduce the amount of inter-core synchronizations, which is costly. 2. we can make the tiling size bigger. ### Options ```text -tile-mix-vector-loop : The trip count of the tiled vector loop for mix kernels -tile-mix-cube-loop : The trip count of the tiled cube loop for mix kernels ``` ## `-triton-global-kernel-args-to-hivm-op`