‘hivm’ Dialect Passes¶

`-auto-blockify-parallel-loop`¶

Enable auto loop on blocks when logical blocknum is larger than physical one

`-compose-collapse-expand`¶

Compose collapse and expand op

`-convert-non-contiguous-reshape-to-copy`¶

Generate copy for reassociative reshape that might be non-contiguous

`-convert-to-hivm-op`¶

Convert Ops from other dialects to HIVM Ops

`-cv-pipelining`¶

Cube and vector core pipelining for multi-buffer’ed mix-cv ops

Options¶

-enable-auto-balance : Enable balancing of vector subtasks during pipelining.

`-hivm-add-ffts-to-syncblocksetop`¶

Add FFTS (arg0) to SyncBlockSetOp

This pass adds FFTS (arg0) to SyncBlockSetOp.

`-hivm-aggregated-decompose-op`¶

Decompose hivm ops that use hivm AggregatedOpInterface

Options¶

-decompose-phase : Specify which decompose phase to apply.

`-hivm-align-alloc-size`¶

Automatically align memref.alloc size for special hivm op that has to access aligned size

For some hivm ops, its access size can only be aligned to hw unit size, so this pass need adjust the memref.alloc size for the case to avoid access out of bounds.

`-hivm-alloc-extra-buffer`¶

Allocate additional temporary buffer for op needed

`-hivm-auto-infer-buffer-size`¶

Auto infer buffer size

infer buffer size by inserting annotation.mark Op### -hivm-bind-sub-block

Tile and bind sub block

`-hivm-bind-sync-block-lock-arg`¶

Bind func augument with hacc.syncblocklock to CreateSyncBlockLockOp

`-hivm-bind-workspace-arg`¶

Bind func augument with hacc.workspace to AllocWorkspaceOp

`-hivm-bubble-up-extract-slice`¶

Tile and bind sub block

`-hivm-clone-tensor-empty`¶

Output clones to different empty tensors based on hivmOp.

This pass clone different tensor.empty to hivmOp output### -hivm-constantize-buffer-size

Try to constantize dynamic shape buffers.

This pass tries to constantize dynamic shape buffers by upper-bounding their original shape. If successful, a new, static shaped alloc will be created and subviewed to the original shape for further use.

`-hivm-decompose-op`¶

Decompose compound hivm op to multiple hivm ops according to hardware ability.

This pass decomposes compound hivm op to multiple hivm ops according to hardware ability. For example, hardware cannot cast f32 to i8 type directly, therefore it needs to be composed to the combination of f32 to f16 cast op and f16 to i8 cast op. In dynamic cases, create annotation.markOp with attr buffer_size_in_byte for allocated extra buffer, the value of buffer_size_in_byte is same as the src or dst operand of original op.

`-hivm-enable-multi-buffer`¶

Enable multi buffer

This pass enable multi buffer for hivm op if the op is marked “hivm.multi_buffer”.

`-hivm-enable-stride-align`¶

Align memref allocations according to stride align marks

Re-allocate memrefs according to anntations of storage_align marks

`-hivm-flatten-ops`¶

Flatten HIVM ops.

`-hivm-graph-sync-solver`¶

Graph sync solver

Options¶

-enable-unit-flag : Enable unit-flag modes for synchronization

`-hivm-infer-data-layout`¶

Infer data layout for HIVM Ops

`-hivm-infer-func-core-type`¶

Infer the core type of each function

`-hivm-infer-mem-scope`¶

Infer memory scope for HIVM Ops

`-hivm-init-entry-kernel`¶

Insert set_mask_norm() at the beginning of entry kernel

`-hivm-inject-block-sync`¶

Auto inject block sync

Options¶

-block-all-sync                 : Enable inject all block sync for HIVM injectBlockSync.
-assume-alive-loops             : Assume that all loops (forOp whileOp) will execute at least once.
-disable-auto-inject-block-sync : Toggle auto set/wait insertion, always keep SetFFTSBaseAddrOp

`-hivm-inject-sync`¶

Auto inject sync

Options¶

-sync-mode          : inject sync mode (default is inject normal)
-enable-unit-flag   : Enable unit-flag modes for synchronization
-assume-alive-loops : Assume that all loops (forOp whileOp) will execute at least once.

`-hivm-inline-fixpipe`¶

Convert ops to HIVM Fixpipe op

`-hivm-inline-load-copy`¶

Inline Copied load

`-hivm-inline-otf-broadcast`¶

Inline OTF broadcast

`-hivm-inline-otf-load-store`¶

On the fly Inline Load and Store operations

`-hivm-insert-infer-sync-block-lock-num-and-init-func`¶

Insert infer-sync-block-lock callback func for host

Calculate total static sync block lock num and initand then create host callback to return this size### -hivm-insert-infer-task-type-func

Infer the module’s task type and emit a host function that returns it.

Detect whether the module is CubeVectorMix, CubeOnly, VectorOnly or Unknown, then emit a host‑side function <original_func>_infer_task_type_function that returns an i8 constant encoding the detected type and is marked with the appropriate HACC host‑function attributes.

`-hivm-insert-infer-workspace-size-func`¶

Insert infer-workspace callback func for host

Calculate total static workspace size after plan-workspace pass and then create host callback to return this size### -hivm-insert-init-and-finish-for-debug

Insert init and finish for debug

`-hivm-insert-load-store-for-mix-cv`¶

Insert load store op for mix cv

`-hivm-insert-nz2nd-for-debug`¶

Insert nz2nd for debug

`-hivm-lift-lowest-stride`¶

Lift lowest stride of operands of hivm ops

For most hivm structured op, lift the lowest stride of operands, if the last dim is not contiguous.

Exceptions: MacroOp and VArangeOp.

For example, the type of operand is memref<16xf16, strided<[8]>>, after LiftLowestStride, the type would be memref<16x1xf32, strided<[8, 1]>> with contiguous last dim.

`-hivm-lift-zero-rank`¶

`-hivm-lower-create-sync-block-lock`¶

Lower CreateSyncBlockLockOp to ViewOp

`-hivm-lower-to-loops`¶

Lower hivm ops to loops

`-hivm-map-forall-to-blocks`¶

Map forall to hivm blocks.

This pass maps each scf.forall operations to HIVM block ops. Mapping is one-to-one and the induction variables of scf. forall are rewritten to hivm block idx ops.

`-hivm-mark-disable-load`¶

Mark the memref.loads that need to disable dcache

`-hivm-mark-multi-buffer`¶

Mark multi buffer for HIVM Ops

This pass mark multi buffer for hivm ops if the option enable-auto is true. Note that Buffer with scope L0C would not be marked. If enable-auto is false, do nothing.

Options¶

-enable-auto                                   : Mark multi buffer automatically.
-limit-auto-multi-buffer-only-for-local-buffer : Disable multi-buffer mark on workspace
-limit-auto-multi-buffer-of-local-buffer       : Limit local buffer auto multi buffer
-limit-mix-auto-multi-buffer-buffer            : Disable multi-buffer-buffer on cube, vector Or no limit
-set-workspace-multibuffer                     : Override for multibuffer number for workspace

`-hivm-mark-real-core-type`¶

Mark scalar operations with core-type attribute.

Options¶

-remove-core-type-attrs : Remove all core type attributes. If set to true, this pass becomes a cleanup pass.

`-hivm-mark-stride-align`¶

Automatically annotate stride_align marks for operands of hivm ops

For all hivm ops, annotate their memref operands with storage_align marks automatically

`-hivm-memref-alloc-to-alloca`¶

Convert local AllocOp to AllocaOp

This pass replace all memref.alloc with non - global memory space to memref.alloca.### -hivm-normalize-loop-iterator

Normalize special state of loop iterator before plan-memory

`-hivm-normalize-matmul`¶

Normalize hivm matmul op

`-hivm-opt-func-output`¶

Try to optimize function output after bufferization.

Try to remove unnecessary address return.### -hivm-opt-single-point

Optimize single point hivm op by scalar operation.

This pass optimize the single point hivm op by scalar operation.

`-hivm-plan-memory`¶

Plan memory for HIVM Ops

Options¶

-mem-plan-mode                 : plan mem mode (default is LOCAL_MEM_PLAN)
-enable-global-workspace-reuse : Enable global workspace reuse ,default : false
-restrict-inplace-as-isa       : restrict memory inplace as isa, default : false

`-hivm-recognize-deinterleave-op`¶

Optimize discontinuous access to deinterleave.

This pass optimize discontinuous memory access using deinterleave.

`-hivm-reduce-rank-subview`¶

Reduce rank using subview

`-hivm-set-buffer-size`¶

`-hivm-split-mix-kernel`¶

Split Mix device functions into AICube and AIVector functions.

Split mix kernels into separate AICube and AIVector kernels, and mark the parent module as a Mix module.

Note:

If a Mix kernel is called within a Host function, a function declaration is generated for the final kernel launch. Currently don’t support calling Mix kernel within a device function.

Input:

func (workspace) attribute {tcore_type = #hivm.tcore_type<CUBE_OR_VECTOR>} {
  t = cube_op ins() outs(workspace)
  ... = vector_op ins(t) ...
}

Output :

func (workspace) attribute {tcore_type = #hivm.tcore_type<CUBE>} {
  t = cube_op ins() outs(workspace)
  annotation.mark t // mark to avoid dce
}

func (workspace) attribute {tcore_type = #hivm.tcore_type<VECTOR>} {
  ... = vector_op ins(workspace) ...
}

`-hivm-sync-block-hoisting`¶

Hoist syncblock lock and unlock operation to the parent region if it is in the scf.for or scf.while

`-hivm-tile-batchmm-into-loop`¶

Tile batch matmul into loop with iteration on batch dimension

`-insert-workspace-for-mix-cv`¶

Insert workspace for mix cv

`-tile-cube-vector-loop`¶

Tile cube and vector loops on local buffer

This pass will attempt to tile cube and vector ops again on the local buffer because:

we can reduce the amount of inter-core synchronizations, which is costly.
we can make the tiling size bigger.

Options¶

-tile-mix-vector-loop : The trip count of the tiled vector loop for mix kernels
-tile-mix-cube-loop   : The trip count of the tiled cube loop for mix kernels

‘hivm’ Dialect Passes¶

-auto-blockify-parallel-loop¶

-compose-collapse-expand¶

-convert-non-contiguous-reshape-to-copy¶

-convert-to-hivm-op¶

-cv-pipelining¶

Options¶

-hivm-add-ffts-to-syncblocksetop¶

-hivm-aggregated-decompose-op¶

Options¶

-hivm-align-alloc-size¶

-hivm-alloc-extra-buffer¶

-hivm-auto-infer-buffer-size¶

-hivm-bind-sync-block-lock-arg¶

-hivm-bind-workspace-arg¶

-hivm-bubble-up-extract-slice¶

-hivm-clone-tensor-empty¶

-hivm-decompose-op¶

-hivm-enable-multi-buffer¶

-hivm-enable-stride-align¶

-hivm-flatten-ops¶

-hivm-graph-sync-solver¶

Options¶

-hivm-infer-data-layout¶

-hivm-infer-func-core-type¶

-hivm-infer-mem-scope¶

-hivm-init-entry-kernel¶

-hivm-inject-block-sync¶

Options¶

-hivm-inject-sync¶

Options¶

-hivm-inline-fixpipe¶

-hivm-inline-load-copy¶

-hivm-inline-otf-broadcast¶

-hivm-inline-otf-load-store¶

-hivm-insert-infer-sync-block-lock-num-and-init-func¶

-hivm-insert-infer-workspace-size-func¶

-hivm-insert-load-store-for-mix-cv¶

-hivm-insert-nz2nd-for-debug¶

-hivm-lift-lowest-stride¶

-hivm-lift-zero-rank¶

-hivm-lower-create-sync-block-lock¶

-hivm-lower-to-loops¶

-hivm-map-forall-to-blocks¶

-hivm-mark-disable-load¶

-hivm-mark-multi-buffer¶

Options¶

-hivm-mark-real-core-type¶

Options¶

-hivm-mark-stride-align¶

-hivm-memref-alloc-to-alloca¶

-hivm-normalize-matmul¶

-hivm-opt-func-output¶

-hivm-plan-memory¶

Options¶

-hivm-recognize-deinterleave-op¶

-hivm-reduce-rank-subview¶

-hivm-set-buffer-size¶

-hivm-split-mix-kernel¶

-hivm-sync-block-hoisting¶

-hivm-tile-batchmm-into-loop¶

-insert-workspace-for-mix-cv¶

-tile-cube-vector-loop¶

Options¶

-triton-global-kernel-args-to-hivm-op¶

`-auto-blockify-parallel-loop`¶

`-compose-collapse-expand`¶

`-convert-non-contiguous-reshape-to-copy`¶

`-convert-to-hivm-op`¶

`-cv-pipelining`¶

`-hivm-add-ffts-to-syncblocksetop`¶

`-hivm-aggregated-decompose-op`¶

`-hivm-align-alloc-size`¶

`-hivm-alloc-extra-buffer`¶

`-hivm-auto-infer-buffer-size`¶

`-hivm-bind-sync-block-lock-arg`¶

`-hivm-bind-workspace-arg`¶

`-hivm-bubble-up-extract-slice`¶

`-hivm-clone-tensor-empty`¶

`-hivm-decompose-op`¶

`-hivm-enable-multi-buffer`¶

`-hivm-enable-stride-align`¶

`-hivm-flatten-ops`¶

`-hivm-graph-sync-solver`¶

`-hivm-infer-data-layout`¶

`-hivm-infer-func-core-type`¶

`-hivm-infer-mem-scope`¶

`-hivm-init-entry-kernel`¶

`-hivm-inject-block-sync`¶

`-hivm-inject-sync`¶

`-hivm-inline-fixpipe`¶

`-hivm-inline-load-copy`¶

`-hivm-inline-otf-broadcast`¶

`-hivm-inline-otf-load-store`¶

`-hivm-insert-infer-sync-block-lock-num-and-init-func`¶

`-hivm-insert-infer-workspace-size-func`¶

`-hivm-insert-load-store-for-mix-cv`¶

`-hivm-insert-nz2nd-for-debug`¶

`-hivm-lift-lowest-stride`¶

`-hivm-lift-zero-rank`¶

`-hivm-lower-create-sync-block-lock`¶

`-hivm-lower-to-loops`¶

`-hivm-map-forall-to-blocks`¶

`-hivm-mark-disable-load`¶

`-hivm-mark-multi-buffer`¶

`-hivm-mark-real-core-type`¶

`-hivm-mark-stride-align`¶

`-hivm-memref-alloc-to-alloca`¶

`-hivm-normalize-matmul`¶

`-hivm-opt-func-output`¶

`-hivm-plan-memory`¶

`-hivm-recognize-deinterleave-op`¶

`-hivm-reduce-rank-subview`¶

`-hivm-set-buffer-size`¶

`-hivm-split-mix-kernel`¶

`-hivm-sync-block-hoisting`¶

`-hivm-tile-batchmm-into-loop`¶

`-insert-workspace-for-mix-cv`¶

`-tile-cube-vector-loop`¶

`-triton-global-kernel-args-to-hivm-op`¶