> For the complete Mojo documentation index, see [llms.txt](/llms.txt).
> Markdown versions of all pages are available by appending .md to any URL (e.g. /docs/manual/basics.md).

# @compiler.register

The `@compiler.register` decorator registers a custom operation for MAX graphs.
For more information, see the
[MAX documentation about custom ops](https://docs.modular.com/max/develop/custom-ops/).

To define a custom operation:

- Import the `compiler` package.

- Create a struct that implements the `execute()` and (optional) `shape()`
  static methods.

- Register it using the `@compiler.register` decorator.

The following snippet shows the outline of a custom operation:

```mojo
@compiler.register("add_vectors_custom")
struct AddVectorsCustom:

    @staticmethod
    def execute[...](https://mojolang.org/docs/reference/decorators/...):
        pass

    @staticmethod
    def shape(...) -> IndexList:
        pass
```

The `@compiler.register` decorator takes a single argument, the name of the
custom operation, as a string. This name is used to load the custom op into your
graph.

Output from the `execute()` method is usually returned using one or more
destination-passing style (DPS) output tensors. Destination-passing style (DPS)
means that the calling function passes in pre-allocated storage space for the
output value(s). This allows for more efficient memory management. For example,
the graph compiler can optimize memory use by allocating output tensors on the
stack, instead of requiring custom ops to allocate heap storage for return
values.

Destination passing style requires the graph compiler to determine the
dimensions of the output tensor(s) before executing the operation. It uses the
operation's `shape()` function to determine the dimensions if they can't be
determined statically.

The following sections describe the `execute()` and `shape()` functions.

## `execute()` function

The `execute()` function performs the actual work of the custom op. It takes the
following parameter:

- `target` (`StaticString`): Indicates the device the operation is running on:
  currently takes the values `"cpu"` or `"gpu"`.

Graph output and input tensors are passed to the `execute()` function as
instances of
[`OutputTensor`](https://docs.modular.com/max/api/kernels/extensibility/tensor/managed_tensor_slice/#outputtensor)
and
[`InputTensor`](https://docs.modular.com/max/api/kernels/extensibility/tensor/managed_tensor_slice/#inputtensor),
respectively. These are both type aliases for specific configurations of
[`ManagedTensorSlice`](https://docs.modular.com/max/api/kernels/extensibility/tensor/managed_tensor_slice/ManagedTensorSlice),
so they both have the same API.

In addition to input and output tensors, the function can take the following
arguments:

- Any arguments of type [`Scalar`](/docs/manual/types/#scalar-values).

- A single argument of type `DeviceContext`. This is required for GPU
  support and is also provided for CPU execution (via `CpuDeviceContext`).

```mojo
import compiler
from std.utils.index import IndexList
from max.tensor import OutputTensor, InputTensor, foreach, ManagedTensorSlice
from std.gpu.host import DeviceContext

@compiler.register("add_vectors_custom")
struct AddVectorsCustom:
    @staticmethod
    def execute[
        # "gpu" or "cpu"
        target: StaticString,
    ](
        # the first argument is the output
        out: OutputTensor,
        # starting here is the list of inputs
        x: InputTensor[dtype = out.dtype, rank = out.rank],
        y: InputTensor[dtype = out.dtype, rank = out.rank],
        # the context is needed for some GPU calls
        ctx: DeviceContext,
    ):

        @parameter
        @always_inline
        def func[width: Int](https://mojolang.org/docs/reference/decorators/idx: IndexList[x.rank]) -> SIMD[x.dtype, width]:
            return x.load[width](https://mojolang.org/docs/reference/decorators/idx.md) + y.load[width](https://mojolang.org/docs/reference/decorators/idx.md)

        foreach[func, target=target](https://mojolang.org/docs/reference/decorators/out, ctx.md)
```

## `shape()` function

The `shape()` function returns the dimensions of the output tensor(s).

The `shape()` function is required only if the graph compiler can't statically
determine the shape of the output tensor(s), and you don't manually annotate the
output shapes when building a graph.

The function takes the same arguments as the `execute()` function, minus the
output tensors and `DeviceContext`. It must return an
[`IndexList`](/docs/std/utils/index_/IndexList/) specifying the dimensions of
the output tensor.

For example, if the operation takes two input tensors, and the shape of the
output tensor matches the first input tensor, you could use the following
`shape()` function:

```mojo
    @staticmethod
    def shape(
        in1: InputTensor,
        in2: InputTensor,
    ) -> IndexList[in1.rank]:
        return in1.spec.shape
```
