# 'hfusion' Dialect Passes ## `-adapt-triton-kernel` _Adapt the triton kernel_ ### Options ```text -hivmc-version : Specify hivmc version to resolve backward compatibility ``` ## `-hfusion-add-ffts-addr` _Add ffts base address to func param and annotation_ ### Options ```text -force-add-ffts-addr : Force adding FFTS base addr to the user specified param location. Default value -1 means no insertion. 0 meansinsert to the first param location. ``` ## `-hfusion-auto-schedule` _Auto schedule fused kernels._ ### Options ```text -block-dim : Number of blocks to use -enable-auto-multi-buffer : Enable auto multi buffer -enable-deterministic-computing : Enable deterministic computing -max-buffer-count-tuning : allow maxBufferCnt tuning -enable-count-buffer-dma-opt : If enabled, the buffer used by DMA operations will not bereused by Vector operations -enable-manage-host-resources : Enable managing resource for Host functions -cube-tiling-tuning : allow cube tiling params tuning -external-tiling-func-path : auto add external tiling func -enable-symbol-analysis : Enable symbol analysis for tiling and fusion ``` ## `-hfusion-cache-io` _Cache input and output argument_ ## `-hfusion-cache-io-for-return-arg` _Cache argument that returns directly_ ## `-hfusion-compose-multi-reduce` _Compose multi reduce optimization_ ### Options ```text -max-compose : Maximum reduce composed into single operation, -1 is limitless -max-dist-diff : Maximum distance difference from common ancestor -aggressive : Aggressive mode will try to reshape if shape are loosely matched ``` ## `-hfusion-constantize-tiling-data` _Propagate constants between tiling and device function_ Propagate constants from calculate tiling to the device function. Modifications made: * Constant tiling data are inlined into the device function. * Constant tiling data are removed from the tiling function. * Constant tiling data are removed from the arguments of the device function. And the call sites are modified accordingly. * Constant tiling data are removed from the callers of the device function, and the callers of the callers, and so on. Constraints/Assumptions: * For all the device functions sharing the same tiling function, the order of tiling data argument is exactly the same. * The tiling arguments in device function's input arguments has the exact same order as the return values of the tiling function. Input ```mlir func.func @tiling_func(%arg0: tensor) -> (i64, i64) attributes {hacc.function_kind = #hacc.function_kind} { %ret0 = "some_calculation"() : () -> i64 %ret1 = arith.constant 42: i64 return %ret0, %ret1: i64, i64 } func.func @device_kernel_tiling_0(%arg0: tensor, %arg1: i64 {hacc.tiling_data}, %arg2: i64 {hacc.tiling_data}) -> tensor attributes {hacc.function_kind = #hacc.function_kind, hacc.tiling_func = "tiling_func"} { "some_use"(%arg1) : (i64) -> () "some_use"(%arg2) : (i64) -> () %ret0 = "some_op"(%arg0) : (tensor) -> tensor return %ret0 : tensor } func.func @device_kernel_tiling_1(%arg0: tensor, %arg1: i64 {hacc.tiling_data}, %arg2: i64 {hacc.tiling_data}) -> tensor attributes {hacc.function_kind = #hacc.function_kind, hacc.tiling_func = "tiling_func"} { "some_use"(%arg1) : (i64) -> () "some_use"(%arg2) : (i64) -> () %ret0 = "some_op"(%arg0) : (tensor) -> tensor return %ret0 : tensor } func.func @main(%arg0: tensor, %arg1: i64 {hacc.tiling_data}, %arg2: i64 {hacc.tiling_data}) -> tensor attributes {hacc.function_kind = #hacc.function_kind} { %0 = arith.index_castui %arg1 : i64 to index %1 = scf.index_switch %0 -> tensor case 1 { %2 = func.call @device_kernel_tiling_1(%arg0, %arg1, %arg2) : (tensor, i64, i64) -> tensor scf.yield %2 : tensor } case 0 { %2 = func.call @device_kernel_tiling_0(%arg0, %arg1, %arg2): (tensor, i64, i64) -> tensor scf.yield %2 : tensor } default { %false = arith.constant false cf.assert %false, "Invalid tiling key" %2 = ub.poison : tensor scf.yield %2 : tensor } return % 1 : tensor < ? x ? xf16 > } ``` Output ```mlir func.func @tiling_func(%arg0: tensor) -> (i64) attributes {hacc.function_kind = #hacc.function_kind} { %ret0 = "some_calculation"() : () -> i64 return %ret0: i64 } func.func @device_kernel_tiling_0(%arg0: tensor, %arg1: i64 {hacc.tiling_data}) -> tensor attributes {hacc.function_kind = #hacc.function_kind, hacc.tiling_func = "tiling_func"} { "some_use"(%arg1) : (i64) -> () %arg2 = arith.constant 32 : i64 "some_use"(%arg2) : (i64) -> () %ret0 = "some_op"(%arg0) : (tensor) -> tensor return %ret0 : tensor } func.func @device_kernel_tiling_1(%arg0: tensor, %arg1: i64 {hacc.tiling_data}) -> tensor attributes {hacc.function_kind = #hacc.function_kind, hacc.tiling_func = "tiling_func"} { "some_use"(%arg1) : (i64) -> () %arg2 = arith.constant 32 : i64 "some_use"(%arg2) : (i64) -> () %ret0 = "some_op"(%arg0) : (tensor) -> tensor return %ret0 : tensor } func.func @main(%arg0: tensor, %arg1: i64 {hacc.tiling_data}) -> tensor attributes {hacc.function_kind = #hacc.function_kind} { %0 = arith.index_castui %arg1 : i64 to index %1 = scf.index_switch %0 -> tensor case 1 { %2 = func.call @device_kernel_tiling_1(%arg0, %arg1) : (tensor, i64) -> tensor scf.yield %2 : tensor } case 0 { %2 = func.call @device_kernel_tiling_0(%arg0, %arg1): (tensor, i64) -> tensor scf.yield %2 : tensor } default { %false = arith.constant false cf.assert %false, "Invalid tiling key" %2 = ub.poison : tensor scf.yield %2 : tensor } return %1 : tensor } ``` ## `-hfusion-convert-generic-to-named` _Convert linalg generic ops to linalg named ops and hfusion named ops._ ## `-hfusion-decompose` _Decompose ops that implemented AggregatedOpInterface._ ### Options ```text -hfusion-decompose-phase : Specify which decompose phase to apply. ``` ## `-hfusion-decompose-multi` _Decompose multi ops into single ones_ ## `-hfusion-downgrade-fp64` _Downgrade fp64 constant to fp32_ ## `-hfusion-drop-symbols` _Drop ranked tensor symbols from operations_ ## `-hfusion-eliminate-duplicate-funcs` _Eliminate duplicate functions after fusion_ ## `-hfusion-flatten-ops` _Flatten linalg and hfusion ops._ ### Options ```text -flatten-mode : Flatten mode, tidy mode will do an analysis on the entire function -skip-host : Whether to skip the host function or not -multi-dynamic-shape : Whether to collapse multiple dynamic shape or not ``` ## `-hfusion-fold-symbolic-dim` _Replace tensor.dim source operands to hfusion::SymbolicDimOp_ ## `-hfusion-fuse-ops` _HFusion Fuse operations on tensors_ ### Options ```text -output-mode : Outlined function output mode (default is multi, can also use single or single-aggr) -fusion-mode : Fusion kind is determined by label -always-inline : Enable always inline for the outline function. -move-out-to-param : Whether move the tensor out to params or not -max-horizontal-fusion-size : Maximum horizontal (non-dependent) fusioning allowed, -1 for unlimited attemptof horizontal fusion -multi-kernel : When disabled, graph must fuse as single kernel; when enabled, outline multiple kernels. -enable-symbol-analysis : Enable symbol dialect analysis. ``` ## `-hfusion-hoist-tensor-empty` _Hoist tensor empty to func parameters and merge into one parameter_ This pass merge all tensor.empty to one func parameter. ## `-hfusion-infer-func-fusion-kind` _Infer function for fusion kind_ ## `-hfusion-infer-out-shapes` _Generate out tensor's shape function for kernel_ ## `-hfusion-inline-brc` _Inline broadcast-like ops._ ## `-hfusion-legalize-bf16` _Normalize BF16 to FP32._ ## `-hfusion-legalize-bool` _Cast int8 to int1 for input and int1 to int8 for output._ ## `-hfusion-normalize-ops` _Normalize Hfusion_ ## `-hfusion-normalize-slice-ops` _Normalize Slice Ops._ ### Options ```text -skip-aligned-slice : Skip FoldInsertSliceToConcat pattern for aligned slice. ``` ## `-hfusion-outline-single-op` _Outline single linalg ops into kernels._ ### Options ```text -move-out-to-param : Whether move the tensor out to params or not ``` ## `-hfusion-pack-tiling-data` _Pack dynamic tiling information into a struct._ Pack the tiling information into a struct. ### Options ```text -include-symbols : Comma separated list of symbols that should apply this transformation. If empty, the default behavior is to apply transformation to all functions. -emit-get-tiling-struct-size-function : When enabled, a host function that returns the number of i64 tiling data is emitted. -pack-tiling-key : When enabled, the tiling key would also be packed into the tiling struct.Otherwise, the tiling key is directly written to a pointer. ``` ## `-hfusion-recache-io` _Recache io_ ## `-hfusion-reorder-ops` _Reorder the ops by bfs._ ## `-hfusion-simplify-ops` _Simplify operations_ ## `-hfusion-tensor-results-to-out-params` _Move tensor results to function output parameters_ ### Options ```text -include-symbols : Comma separated list of symbols that should apply this transformation. If empty, the default behavior is to apply transformation to all functions. -enable-manage-host-resources : Enable managing resource for Host functions ``` ## `-hfusion-unfold-symbolic-dim` _Replace hfusion::SymbolicDimOp to same symbolic arguments_ ## `-hfusion-wrap-host-func` _Create wrappers for certain host related functions_ This pass creates wrapper functions for host tiling func, infer shape func, etc. ### Options ```text -remove-unused-arguments : Whether to remove unused arguments in host wrapper function or not ```