<!-- Autogenerated by mlir-tblgen; don't manually edit -->
# 'hfusion' Dialect Passes

## `-adapt-triton-kernel`

_Adapt the triton kernel_

### Options

```text
-hivmc-version : Specify hivmc version to resolve backward compatibility
```

## `-hfusion-add-ffts-addr`

_Add ffts base address to func param and annotation_

### Options

```text
-force-add-ffts-addr : Force adding FFTS base addr to the user specified param location. Default value -1 means no insertion. 0 meansinsert to the first param location.
```

## `-hfusion-auto-schedule`

_Auto schedule fused kernels._

### Options

```text
-block-dim                      : Number of blocks to use
-enable-auto-multi-buffer       : Enable auto multi buffer
-enable-deterministic-computing : Enable deterministic computing
-max-buffer-count-tuning        : allow maxBufferCnt tuning
-enable-count-buffer-dma-opt    : If enabled, the buffer used by DMA operations will not bereused by Vector operations
-enable-manage-host-resources   : Enable managing resource for Host functions
-cube-tiling-tuning             : allow cube tiling params tuning
-external-tiling-func-path      : auto add external tiling func
-enable-symbol-analysis         : Enable symbol analysis for tiling and fusion
```

## `-hfusion-cache-io`

_Cache input and output argument_

## `-hfusion-cache-io-for-return-arg`

_Cache argument that returns directly_

## `-hfusion-compose-multi-reduce`

_Compose multi reduce optimization_

### Options

```text
-max-compose   : Maximum reduce composed into single operation, -1 is limitless
-max-dist-diff : Maximum distance difference from common ancestor
-aggressive    : Aggressive mode will try to reshape if shape are loosely matched
```

## `-hfusion-constantize-tiling-data`

_Propagate constants between tiling and device function_

 Propagate constants from calculate tiling to the device function.

 Modifications made:

   * Constant tiling data are inlined into the device function.
   * Constant tiling data are removed from the tiling function.
   * Constant tiling data are removed from the arguments of the device
     function. And the call sites are modified accordingly.
   * Constant tiling data are removed from the callers of the device function,
     and the callers of the callers, and so on.

 Constraints/Assumptions:

   * For all the device functions sharing the same tiling function, the order
     of tiling data argument is exactly the same.
   * The tiling arguments in device function's input arguments has the exact
     same order as the return values of the tiling function.

 Input

 ```mlir
 func.func @tiling_func(%arg0: tensor<?x?xf16>) -> (i64, i64)
 attributes {hacc.function_kind = #hacc.function_kind<HOST>} {
   %ret0 = "some_calculation"() : () -> i64
   %ret1 = arith.constant 42: i64
   return %ret0, %ret1: i64, i64
 }

 func.func @device_kernel_tiling_0(%arg0: tensor<?x?xf16>,
                                  %arg1: i64 {hacc.tiling_data},
                                  %arg2: i64 {hacc.tiling_data}) -> tensor<?x?xf16>
 attributes {hacc.function_kind = #hacc.function_kind<DEVICE>, hacc.tiling_func = "tiling_func"} {
   "some_use"(%arg1) : (i64) -> ()
   "some_use"(%arg2) : (i64) -> ()
   %ret0 = "some_op"(%arg0) : (tensor<?x?xf16>) -> tensor<?x?xf16>
   return %ret0 : tensor<?x?xf16>
 }

 func.func @device_kernel_tiling_1(%arg0: tensor<?x?xf16>,
                                   %arg1: i64 {hacc.tiling_data},
                                   %arg2: i64 {hacc.tiling_data}) -> tensor<?x?xf16>
 attributes {hacc.function_kind = #hacc.function_kind<DEVICE>, hacc.tiling_func = "tiling_func"} {
   "some_use"(%arg1) : (i64) -> ()
   "some_use"(%arg2) : (i64) -> ()
   %ret0 = "some_op"(%arg0) : (tensor<?x?xf16>) -> tensor<?x?xf16>
   return %ret0 : tensor<?x?xf16>
 }

 func.func @main(%arg0: tensor<?x?xf16>,
                 %arg1: i64 {hacc.tiling_data},
                 %arg2: i64 {hacc.tiling_data}) -> tensor<?x?xf16>
 attributes {hacc.function_kind = #hacc.function_kind<HOST>} {
   %0 = arith.index_castui %arg1 : i64 to index
   %1 = scf.index_switch %0 -> tensor<?x?xf16>
   case 1 {
     %2 = func.call @device_kernel_tiling_1(%arg0, %arg1, %arg2) : (tensor<?x?xf16>, i64, i64) -> tensor<?x?xf16>
     scf.yield %2 : tensor<?x?xf16>
   }
   case 0 {
     %2 = func.call @device_kernel_tiling_0(%arg0, %arg1, %arg2): (tensor<?x?xf16>, i64, i64) -> tensor<?x?xf16>
     scf.yield %2 : tensor<?x?xf16>
   }
   default {
     %false = arith.constant false
     cf.assert %false, "Invalid tiling key"
     %2 = ub.poison : tensor<?x?xf16>
     scf.yield %2 : tensor<?x?xf16>
   }
   return % 1 : tensor < ? x ? xf16 >
 }
 ```

 Output

 ```mlir
 func.func @tiling_func(%arg0: tensor<?x?xf16>) -> (i64)
 attributes {hacc.function_kind = #hacc.function_kind<HOST>} {
   %ret0 = "some_calculation"() : () -> i64
   return %ret0: i64
 }

 func.func @device_kernel_tiling_0(%arg0: tensor<?x?xf16>,
                                   %arg1: i64 {hacc.tiling_data}) -> tensor<?x?xf16>
 attributes {hacc.function_kind = #hacc.function_kind<DEVICE>, hacc.tiling_func = "tiling_func"} {
   "some_use"(%arg1) : (i64) -> ()
   %arg2 = arith.constant 32 : i64
   "some_use"(%arg2) : (i64) -> ()
   %ret0 = "some_op"(%arg0) : (tensor<?x?xf16>) -> tensor<?x?xf16>
   return %ret0 : tensor<?x?xf16>
 }

 func.func @device_kernel_tiling_1(%arg0: tensor<?x?xf16>,
                                   %arg1: i64 {hacc.tiling_data}) -> tensor<?x?xf16>
 attributes {hacc.function_kind = #hacc.function_kind<DEVICE>, hacc.tiling_func = "tiling_func"} {
   "some_use"(%arg1) : (i64) -> ()
   %arg2 = arith.constant 32 : i64
   "some_use"(%arg2) : (i64) -> ()
   %ret0 = "some_op"(%arg0) : (tensor<?x?xf16>) -> tensor<?x?xf16>
   return %ret0 : tensor<?x?xf16>
 }

 func.func @main(%arg0: tensor<?x?xf16>,
                 %arg1: i64 {hacc.tiling_data}) -> tensor<?x?xf16>
 attributes {hacc.function_kind = #hacc.function_kind<HOST>} {
   %0 = arith.index_castui %arg1 : i64 to index
   %1 = scf.index_switch %0 -> tensor<?x?xf16>
   case 1 {
     %2 = func.call @device_kernel_tiling_1(%arg0, %arg1) : (tensor<?x?xf16>, i64) -> tensor<?x?xf16>
     scf.yield %2 : tensor<?x?xf16>
   }
   case 0 {
     %2 = func.call @device_kernel_tiling_0(%arg0, %arg1): (tensor<?x?xf16>, i64) -> tensor<?x?xf16>
     scf.yield %2 : tensor<?x?xf16>
   }
   default {
     %false = arith.constant false
     cf.assert %false, "Invalid tiling key"
     %2 = ub.poison : tensor<?x?xf16>
     scf.yield %2 : tensor <?x?xf16>
   }
   return %1 : tensor<?x?xf16>
 }
```

## `-hfusion-convert-generic-to-named`

_Convert linalg generic ops to linalg named ops and hfusion named ops._

## `-hfusion-decompose`

_Decompose ops that implemented AggregatedOpInterface._

### Options

```text
-hfusion-decompose-phase : Specify which decompose phase to apply.
```

## `-hfusion-decompose-multi`

_Decompose multi ops into single ones_

## `-hfusion-downgrade-fp64`

_Downgrade fp64 constant to fp32_

## `-hfusion-drop-symbols`

_Drop ranked tensor symbols from operations_

## `-hfusion-eliminate-duplicate-funcs`

_Eliminate duplicate functions after fusion_

## `-hfusion-flatten-ops`

_Flatten linalg and hfusion ops._

### Options

```text
-flatten-mode        : Flatten mode, tidy mode will do an analysis on the entire function
-skip-host           : Whether to skip the host function or not
-multi-dynamic-shape : Whether to collapse multiple dynamic shape or not
```

## `-hfusion-fold-symbolic-dim`

_Replace tensor.dim source operands to hfusion::SymbolicDimOp_

## `-hfusion-fuse-ops`

_HFusion Fuse operations on tensors_

### Options

```text
-output-mode                : Outlined function output mode (default is multi, can also use single or single-aggr)
-fusion-mode                : Fusion kind is determined by label
-always-inline              : Enable always inline for the outline function.
-move-out-to-param          : Whether move the tensor out to params or not
-max-horizontal-fusion-size : Maximum horizontal (non-dependent) fusioning allowed, -1 for unlimited attemptof horizontal fusion
-multi-kernel               : When disabled, graph must fuse as single kernel; when enabled, outline multiple kernels.
-enable-symbol-analysis     : Enable symbol dialect analysis.
```

## `-hfusion-hoist-tensor-empty`

_Hoist tensor empty to func parameters and merge into one parameter_

This pass merge all tensor.empty to one func parameter.

## `-hfusion-infer-func-fusion-kind`

_Infer function for fusion kind_

## `-hfusion-infer-out-shapes`

_Generate out tensor's shape function for kernel_

## `-hfusion-inline-brc`

_Inline broadcast-like ops._

## `-hfusion-legalize-bf16`

_Normalize BF16 to FP32._

## `-hfusion-legalize-bool`

_Cast int8 to int1 for input and int1 to int8 for output._

## `-hfusion-normalize-ops`

_Normalize Hfusion_

## `-hfusion-normalize-slice-ops`

_Normalize Slice Ops._

### Options

```text
-skip-aligned-slice : Skip FoldInsertSliceToConcat pattern for aligned slice.
```

## `-hfusion-outline-single-op`

_Outline single linalg ops into kernels._

### Options

```text
-move-out-to-param : Whether move the tensor out to params or not
```

## `-hfusion-pack-tiling-data`

_Pack dynamic tiling information into a struct._

Pack the tiling information into a struct.

### Options

```text
-include-symbols                      : Comma separated list of symbols that should apply this transformation. If empty, the default behavior is to apply transformation to all functions.
-emit-get-tiling-struct-size-function : When enabled, a host function that returns the number of i64 tiling data is emitted.
-pack-tiling-key                      : When enabled, the tiling key would also be packed into the tiling struct.Otherwise, the tiling key is directly written to a pointer.
```

## `-hfusion-recache-io`

_Recache io_

## `-hfusion-reorder-ops`

_Reorder the ops by bfs._

## `-hfusion-simplify-ops`

_Simplify operations_

## `-hfusion-tensor-results-to-out-params`

_Move tensor results to function output parameters_

### Options

```text
-include-symbols              : Comma separated list of symbols that should apply this transformation. If empty, the default behavior is to apply transformation to all functions.
-enable-manage-host-resources : Enable managing resource for Host functions
```

## `-hfusion-unfold-symbolic-dim`

_Replace hfusion::SymbolicDimOp to same symbolic arguments_

## `-hfusion-wrap-host-func`

_Create wrappers for certain host related functions_

This pass creates wrapper functions for host tiling func, infer shape func, etc.

### Options

```text
-remove-unused-arguments : Whether to remove unused arguments in host wrapper function or not
```