diff --git a/.buildinfo b/.buildinfo
index 83ed1f5f..6964678a 100644
--- a/.buildinfo
+++ b/.buildinfo
@@ -1,4 +1,4 @@
# Sphinx build info version 1
# This file records the configuration used when building these files. When it is not found, a full rebuild will be done.
-config: c9e032fa7f49dc809f164063dfb2e28e
+config: dabe4189cc532017a30e750cb5d38b1e
tags: 645f666f9bcd5a90fca523b33c5a78b7
diff --git a/_downloads/2b236384a146de2d4a64081ddb0a7c9a/developer_01_ir_builder.zip b/_downloads/2b236384a146de2d4a64081ddb0a7c9a/developer_01_ir_builder.zip
index 37c81a1b..5736fe5f 100644
Binary files a/_downloads/2b236384a146de2d4a64081ddb0a7c9a/developer_01_ir_builder.zip and b/_downloads/2b236384a146de2d4a64081ddb0a7c9a/developer_01_ir_builder.zip differ
diff --git a/_downloads/39c6904b3f007c07e3d59200d0bf98b4/dive_03_composition.ipynb b/_downloads/39c6904b3f007c07e3d59200d0bf98b4/dive_03_composition.ipynb
new file mode 100644
index 00000000..b695a4e5
--- /dev/null
+++ b/_downloads/39c6904b3f007c07e3d59200d0bf98b4/dive_03_composition.ipynb
@@ -0,0 +1,183 @@
+{
+ "cells": [
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "\n# Kernel Composition\n\n**Author**: Hongzheng Chen (hzchen@cs.cornell.edu)\n\nThis document will discuss kernel composition.\nIn the previous tutorials, we have seen how to write a simple kernel.\nHowever, in real applications, we often need to compose multiple kernels together.\n\nIn the following example, we define a ``matrix_add`` and a ``gemm`` kernel, and wrap them into a ``top``-level function.\n"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {
+ "collapsed": false
+ },
+ "outputs": [],
+ "source": [
+ "import allo\nfrom allo.ir.types import int32, float32\n\nM, K, N = 32, 32, 32\n\n\ndef matrix_add(A: int32[M, N]) -> int32[M, N]:\n B: int32[M, N] = 0\n for i, j in allo.grid(M, N):\n B[i, j] = A[i, j] + 1\n return B\n\n\ndef gemm(A: int32[M, K], B: int32[K, N]) -> int32[M, N]:\n C: int32[M, N] = 0\n for i, j in allo.grid(M, N):\n for k in allo.reduction(K):\n C[i, j] += A[i, k] * B[k, j]\n return C\n\n\ndef top(A: int32[M, K], B: int32[K, N]) -> int32[M, N]:\n C = gemm(A, B)\n D = matrix_add(C)\n return D"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "Different teams or people can then work on different parts of the code and optimize each kernel.\nWe first create a schedule for the ``matrix_add`` kernel, and add several optimizations.\n\n"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {
+ "collapsed": false
+ },
+ "outputs": [],
+ "source": [
+ "s1 = allo.customize(matrix_add)\ns1.pipeline(\"j\")\nprint(s1.module)"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "Then we create a schedule for the ``gemm`` kernel and optimize it.\n\n"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {
+ "collapsed": false
+ },
+ "outputs": [],
+ "source": [
+ "s2 = allo.customize(gemm)\ns2.reorder(\"k\", \"j\")\ns2.buffer_at(s2.C, axis=\"i\")\ns2.pipeline(\"j\")\nprint(s2.module)"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "Notice that now we only optimize the separate kernels but do not incorporate them into the top-level function, as shown in the following printed module.\n\n"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {
+ "collapsed": false
+ },
+ "outputs": [],
+ "source": [
+ "s = allo.customize(top)\nprint(s.module)"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "Therefore, after each part has been optimized, we need to explicitly *compose* them together.\nIn Allo, we can use the ``.compose()`` primitive to compose the schedules together into the parent function.\n\n"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {
+ "collapsed": false
+ },
+ "outputs": [],
+ "source": [
+ "s.compose([s1, s2])\nprint(s.module)"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "We can see that the schedules for the ``matrix_add`` and ``gemm`` kernels are both correctly optimized in the top-level function.\n\n"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "## Template Composition\nSometimes we may define template kernels and invoke the kernel with different template arguments. Allo provides an *id* option to specify the exact kernel to be composed.\n\n"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {
+ "collapsed": false
+ },
+ "outputs": [],
+ "source": [
+ "def kernel[T_in, T_out, S](A: \"T_in[S]\") -> \"T_out[S]\":\n B: T_out[S] = 0\n for i in range(S):\n with allo.meta_if(T_out == int32):\n B[i] = A[i] + 1\n with allo.meta_else():\n B[i] = A[i] * 2\n return B\n\n\ndef top2(A: int32[M]) -> float32[M]:\n C = kernel[int32, int32, M, \"K1\"](A)\n D = kernel[int32, float32, M, \"K2\"](C)\n return D"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "Specifically, the last argument of the template kernel is the *id* of the kernel. Later on we can use this ID for distinguishing different kernels during composition.\nWe also customize the two template kernels with different optimizations first.\n\n"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {
+ "collapsed": false
+ },
+ "outputs": [],
+ "source": [
+ "s1 = allo.customize(kernel, instantiate=[int32, int32, M])\ns1.unroll(\"i\", factor=4)\nprint(s1.module)\n\ns2 = allo.customize(kernel, instantiate=[int32, float32, M])\ns2.pipeline(\"i\")\nprint(s2.module)"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "Finally, we compose the two template kernels into the top-level function with the ID specified.\n\n"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {
+ "collapsed": false
+ },
+ "outputs": [],
+ "source": [
+ "s = allo.customize(top2)\ns.compose(s1, id=\"K1\")\ns.compose(s2, id=\"K2\")\nprint(s.module)"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "We can see from the printed module that the loop in the first kernel is unrolled by a factor of 4, and the loop in the second kernel is pipelined.\n\n"
+ ]
+ }
+ ],
+ "metadata": {
+ "kernelspec": {
+ "display_name": "Python 3",
+ "language": "python",
+ "name": "python3"
+ },
+ "language_info": {
+ "codemirror_mode": {
+ "name": "ipython",
+ "version": 3
+ },
+ "file_extension": ".py",
+ "mimetype": "text/x-python",
+ "name": "python",
+ "nbconvert_exporter": "python",
+ "pygments_lexer": "ipython3",
+ "version": "3.12.5"
+ }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 0
+}
\ No newline at end of file
diff --git a/_downloads/4569d89feac47262c8c4e3e128da7a7e/dive_04_features.zip b/_downloads/4569d89feac47262c8c4e3e128da7a7e/dive_04_features.zip
new file mode 100644
index 00000000..bc7cd824
Binary files /dev/null and b/_downloads/4569d89feac47262c8c4e3e128da7a7e/dive_04_features.zip differ
diff --git a/_downloads/48b69635df4cfe1643d9d6b9bdf6cd79/tutorial_01_get_started.zip b/_downloads/48b69635df4cfe1643d9d6b9bdf6cd79/tutorial_01_get_started.zip
index 5c881bf2..49c413eb 100644
Binary files a/_downloads/48b69635df4cfe1643d9d6b9bdf6cd79/tutorial_01_get_started.zip and b/_downloads/48b69635df4cfe1643d9d6b9bdf6cd79/tutorial_01_get_started.zip differ
diff --git a/_downloads/4fba383e419c1fc1ea22179140eb2d12/dive_01_data_types.py b/_downloads/4fba383e419c1fc1ea22179140eb2d12/dive_01_data_types.py
new file mode 100644
index 00000000..180bf685
--- /dev/null
+++ b/_downloads/4fba383e419c1fc1ea22179140eb2d12/dive_01_data_types.py
@@ -0,0 +1,114 @@
+# Copyright Allo authors. All Rights Reserved.
+# SPDX-License-Identifier: Apache-2.0
+
+"""
+Data Types and Type Casting
+===========================
+
+**Author**: Hongzheng Chen (hzchen@cs.cornell.edu)
+
+This document will discuss the Allo-supported data types in detail.
+All the data types are defined in the ``allo.ir.types`` module.
+"""
+
+import allo
+from allo.ir.types import int16, int32, float32, Int, UInt, Float, Fixed
+
+##############################################################################
+# Currently, Allo supports three base data types for mathematical operations:
+#
+# - Integers: ``Int(bitwdith)``, ``UInt(bitwidth)``
+# - Floating points: ``Float(bitwidth)`` (only support 16, 32, and 64 bits)
+# - Fixed points: ``Fixed(bitwidth, frac)``, ``UFixed(bitwidth, frac)``
+#
+# For example, one can declare a 15-bit integer as ``Int(15)`` and an unsigned 8-bit fixed-point number with 3 fractional bits as ``UFixed(8, 3)``.
+# For all the C/C++ supported data types, we provide shorthands like ``float32`` and ``int16`` to easily declare them.
+
+# %%
+# Notice different from native Python, Allo requires the program to be **strongly and statically typed**.
+# The variable types are either declared explicitly or inferred from the context.
+# For a variable that first appears in the program, we should declare it with an expected data type using Python's type hint notation:
+
+a: int32
+
+# %%
+# Once the data types are defined, an important consideration is how to handle
+# operations between variables of different types. Allo supports two types of casting:
+# (1) implicit casting that is automatically done by the Allo compiler;
+# and (2) explicit casting that is manually done by the user.
+
+##############################################################################
+# Implicit Casting
+# ----------------
+# Allo has a strong type system that follows the `MLIR convention `_ to enforce the operand types are the same for the arithmetic operations.
+# However, it is burdensome for users to cast the variables every time, and it is also error-prone to avoid overflow when performing computations.
+# Therefore, Allo is equipped with builtin casting rules to automatically cast the variables to the same type before the operation, which is called *implicit casting*.
+# An example is shown below:
+
+
+def add(a: int32, b: int32) -> int32:
+ return a + b
+
+
+s = allo.customize(add)
+print(s.module)
+
+# %%
+# We can see that ``a`` and ``b`` are firstly casted to ``int33``, added
+# together, and converted back to ``int32``.
+# This is to avoid overflow and is automatically inferred by the Allo compiler.
+
+
+##############################################################################
+# Explicit Casting
+# ----------------
+# One can also explicitly cast the variable to a specific type by creating an intermediate variable,
+# or use Python-builtin functions like ``float()`` and ``int()`` to explicitly cast a variable to ``float32`` or ``int32``.
+# Another example is shown below:
+
+
+def cast(a: int32) -> int16:
+ b: float32 = a # explicit
+ c: float32 = b * 2
+ d: float32 = float(a) * 2
+ e: int16 = c + d
+ return e
+
+
+s = allo.customize(cast)
+print(s.module)
+
+# %%
+# By explicitly creating an intermediate variable ``b``, we can cast the ``int32`` variable ``a`` to the desired floating-point type.
+# Similarly, calling ``float(a)`` can also cast ``a`` to a floating-point type.
+#
+# .. note::
+#
+# The above stated explicit casting between integers and floating points preserves the value but the precision may be changed.
+# If you want to use a union type to represent both integers and floating points, please use the `.bitcast()` API instead. For example, ``a.bitcast()`` can convert ``int32`` to ``float32`` representation with the bit pattern preserved.
+
+##############################################################################
+# Bit Operations
+# --------------
+# As hardware accelerators have ability to manipulate each bit of the data, Allo supports bit operations on
+# those integer types. For example, we can access a specific bit in an integer ``a`` using the indexing operator:
+#
+# .. code-block:: python
+#
+# a[15]
+
+# %%
+# We can also extract a chunk of bits from an integer using the slicing operator:
+#
+# .. code-block:: python
+#
+# a[0:16]
+#
+# .. note::
+#
+# Allo follows the Python convention that the upper bound is not included, so ``[0:16]`` means
+# extracting the first 16 bits, which is different from the Xilinx HLS convention that uses ``[0:15]``
+# to indicate the first 16 bits.
+
+# %%
+# Not only constant values are supported, but also variables can be used as the index or the slice range.
diff --git a/_downloads/5c3db288c9103701a8cc33d4c4f30066/dive_03_composition.zip b/_downloads/5c3db288c9103701a8cc33d4c4f30066/dive_03_composition.zip
new file mode 100644
index 00000000..ecea986b
Binary files /dev/null and b/_downloads/5c3db288c9103701a8cc33d4c4f30066/dive_03_composition.zip differ
diff --git a/_downloads/68e0932078b39343e70c899a03d3ae7c/dive_01_data_types.ipynb b/_downloads/68e0932078b39343e70c899a03d3ae7c/dive_01_data_types.ipynb
new file mode 100644
index 00000000..690f7223
--- /dev/null
+++ b/_downloads/68e0932078b39343e70c899a03d3ae7c/dive_01_data_types.ipynb
@@ -0,0 +1,146 @@
+{
+ "cells": [
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "\n# Data Types and Type Casting\n\n**Author**: Hongzheng Chen (hzchen@cs.cornell.edu)\n\nThis document will discuss the Allo-supported data types in detail.\nAll the data types are defined in the ``allo.ir.types`` module.\n"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {
+ "collapsed": false
+ },
+ "outputs": [],
+ "source": [
+ "import allo\nfrom allo.ir.types import int16, int32, float32, Int, UInt, Float, Fixed"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "Currently, Allo supports three base data types for mathematical operations:\n\n- Integers: ``Int(bitwdith)``, ``UInt(bitwidth)``\n- Floating points: ``Float(bitwidth)`` (only support 16, 32, and 64 bits)\n- Fixed points: ``Fixed(bitwidth, frac)``, ``UFixed(bitwidth, frac)``\n\nFor example, one can declare a 15-bit integer as ``Int(15)`` and an unsigned 8-bit fixed-point number with 3 fractional bits as ``UFixed(8, 3)``.\nFor all the C/C++ supported data types, we provide shorthands like ``float32`` and ``int16`` to easily declare them.\n\n"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "Notice different from native Python, Allo requires the program to be **strongly and statically typed**.\nThe variable types are either declared explicitly or inferred from the context.\nFor a variable that first appears in the program, we should declare it with an expected data type using Python's type hint notation:\n\n"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {
+ "collapsed": false
+ },
+ "outputs": [],
+ "source": [
+ "a: int32"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "Once the data types are defined, an important consideration is how to handle\noperations between variables of different types. Allo supports two types of casting:\n(1) implicit casting that is automatically done by the Allo compiler;\nand (2) explicit casting that is manually done by the user.\n\n"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "## Implicit Casting\nAllo has a strong type system that follows the [MLIR convention](https://mlir.llvm.org/docs/Dialects/ArithOps/) to enforce the operand types are the same for the arithmetic operations.\nHowever, it is burdensome for users to cast the variables every time, and it is also error-prone to avoid overflow when performing computations.\nTherefore, Allo is equipped with builtin casting rules to automatically cast the variables to the same type before the operation, which is called *implicit casting*.\nAn example is shown below:\n\n"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {
+ "collapsed": false
+ },
+ "outputs": [],
+ "source": [
+ "def add(a: int32, b: int32) -> int32:\n return a + b\n\n\ns = allo.customize(add)\nprint(s.module)"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "We can see that ``a`` and ``b`` are firstly casted to ``int33``, added\ntogether, and converted back to ``int32``.\nThis is to avoid overflow and is automatically inferred by the Allo compiler.\n\n"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "## Explicit Casting\nOne can also explicitly cast the variable to a specific type by creating an intermediate variable,\nor use Python-builtin functions like ``float()`` and ``int()`` to explicitly cast a variable to ``float32`` or ``int32``.\nAnother example is shown below:\n\n"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {
+ "collapsed": false
+ },
+ "outputs": [],
+ "source": [
+ "def cast(a: int32) -> int16:\n b: float32 = a # explicit\n c: float32 = b * 2\n d: float32 = float(a) * 2\n e: int16 = c + d\n return e\n\n\ns = allo.customize(cast)\nprint(s.module)"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "By explicitly creating an intermediate variable ``b``, we can cast the ``int32`` variable ``a`` to the desired floating-point type.\nSimilarly, calling ``float(a)`` can also cast ``a`` to a floating-point type.\n\n
Note
The above stated explicit casting between integers and floating points preserves the value but the precision may be changed.\n If you want to use a union type to represent both integers and floating points, please use the `.bitcast()` API instead. For example, ``a.bitcast()`` can convert ``int32`` to ``float32`` representation with the bit pattern preserved.
\n\n"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "## Bit Operations\nAs hardware accelerators have ability to manipulate each bit of the data, Allo supports bit operations on\nthose integer types. For example, we can access a specific bit in an integer ``a`` using the indexing operator:\n\n```python\na[15]\n```\n"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "We can also extract a chunk of bits from an integer using the slicing operator:\n\n```python\na[0:16]\n```\n
Note
Allo follows the Python convention that the upper bound is not included, so ``[0:16]`` means\n extracting the first 16 bits, which is different from the Xilinx HLS convention that uses ``[0:15]``\n to indicate the first 16 bits.
\n\n"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "Not only constant values are supported, but also variables can be used as the index or the slice range.\n\n"
+ ]
+ }
+ ],
+ "metadata": {
+ "kernelspec": {
+ "display_name": "Python 3",
+ "language": "python",
+ "name": "python3"
+ },
+ "language_info": {
+ "codemirror_mode": {
+ "name": "ipython",
+ "version": 3
+ },
+ "file_extension": ".py",
+ "mimetype": "text/x-python",
+ "name": "python",
+ "nbconvert_exporter": "python",
+ "pygments_lexer": "ipython3",
+ "version": "3.12.5"
+ }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 0
+}
\ No newline at end of file
diff --git a/_downloads/77583a2b4d388a5f4b2bf7b3eec828d2/dive_01_data_types.zip b/_downloads/77583a2b4d388a5f4b2bf7b3eec828d2/dive_01_data_types.zip
new file mode 100644
index 00000000..f8f8a0b2
Binary files /dev/null and b/_downloads/77583a2b4d388a5f4b2bf7b3eec828d2/dive_01_data_types.zip differ
diff --git a/_downloads/849a5b43539eae829c9f79867111880a/dive_02_template.zip b/_downloads/849a5b43539eae829c9f79867111880a/dive_02_template.zip
new file mode 100644
index 00000000..e06fcf76
Binary files /dev/null and b/_downloads/849a5b43539eae829c9f79867111880a/dive_02_template.zip differ
diff --git a/_downloads/8d3af32bb0bffe35477d27ae08e595fe/tutorial_01_get_started.py b/_downloads/8d3af32bb0bffe35477d27ae08e595fe/tutorial_01_get_started.py
index 815d271d..253d0411 100644
--- a/_downloads/8d3af32bb0bffe35477d27ae08e595fe/tutorial_01_get_started.py
+++ b/_downloads/8d3af32bb0bffe35477d27ae08e595fe/tutorial_01_get_started.py
@@ -34,8 +34,10 @@
# %%
# We then define a function that takes two 32x32 matrices as inputs and
# returns a 32x32 matrix as output. The variable declaration is defined
-# as ``: []``. We require **strict type annotation** in
-# Allo's kernels, which is different from directly programming in Python.
+# as ``: []``, and the function type is defined as
+# ``(, , ...) -> ``.
+# We require **strict type annotation** in Allo's kernels, which is different
+# from directly programming in Python.
#
# Inside the kernel, we provide a shorthand for the loop iterator. For example,
# ``for i, j, k in allo.grid(32, 32, 32)`` is equivalent to the following
diff --git a/_downloads/90b883f891c63f481ffa4756cd7e0781/dive_04_features.ipynb b/_downloads/90b883f891c63f481ffa4756cd7e0781/dive_04_features.ipynb
new file mode 100644
index 00000000..a90ba5e3
--- /dev/null
+++ b/_downloads/90b883f891c63f481ffa4756cd7e0781/dive_04_features.ipynb
@@ -0,0 +1,86 @@
+{
+ "cells": [
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "\n# Other Features\n\n**Author**: Hongzheng Chen (hzchen@cs.cornell.edu)\n\nThis document will discuss other features that are not covered in the previous tutorials.\n"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "## Dynamic Shapes\nIn some cases, the shape of the tensor is not known at compile time, so we can use ``[...]`` to represent the dynamic shape.\nFrom the generated MLIR module, we can see it has a ``\"?\"`` in the shape of the tensor, which means the shape is not predefined,\nbut we can still run the LLVM module with arbitrary shapes of NumPy arrays.\n\n"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {
+ "collapsed": false
+ },
+ "outputs": [],
+ "source": [
+ "import allo\nfrom allo.ir.types import int32, float32\nimport numpy as np\n\n\ndef kernel(A: float32[...], B: float32[...], size: int32):\n for i in range(size):\n B[i] = A[i]\n\n\ns = allo.customize(kernel)\nprint(s.module)\nnp_A = np.random.random((256,)).astype(np.float32)\nallo_A = np.zeros((256,)).astype(np.float32)\nmod = s.build()\nmod(np_A, allo_A, 256)\nnp.testing.assert_allclose(np_A, allo_A)"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "We can also check the generated HLS code that the arguments are declared as pointers.\n\n"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {
+ "collapsed": false
+ },
+ "outputs": [],
+ "source": [
+ "code = s.build(target=\"vhls\")\nprint(code)"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "## Tuple Return\nAnother feature is the tuple support. As in Python, we can return multiple values from a function, Allo\nalso supports this by explicitly specifying the return type as a tuple.\n\n"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {
+ "collapsed": false
+ },
+ "outputs": [],
+ "source": [
+ "def callee(a: float32, b: float32) -> (float32, float32):\n c: float32 = a + b\n d: float32 = a - b\n return c, d\n\n\ndef kernel(A: float32[10], B: float32[10]) -> (float32[10], float32[10]):\n C: float32[10] = 0\n D: float32[10] = 0\n for i in range(10):\n C[i], D[i] = callee(A[i], B[i])\n return C, D\n\n\ns = allo.customize(kernel)\nprint(s.module)\nmod = s.build()\nnp_A = np.random.random((10,)).astype(np.float32)\nnp_B = np.random.random((10,)).astype(np.float32)\nnp_C, np_D = mod(np_A, np_B)\nnp_C_ref = np.zeros((10,), dtype=np.float32)\nnp_D_ref = np.zeros((10,), dtype=np.float32)\nfor i in range(10):\n np_C_ref[i], np_D_ref[i] = callee(np_A[i], np_B[i])\nnp.testing.assert_allclose(np_C, np_C_ref)\nnp.testing.assert_allclose(np_D, np_D_ref)"
+ ]
+ }
+ ],
+ "metadata": {
+ "kernelspec": {
+ "display_name": "Python 3",
+ "language": "python",
+ "name": "python3"
+ },
+ "language_info": {
+ "codemirror_mode": {
+ "name": "ipython",
+ "version": 3
+ },
+ "file_extension": ".py",
+ "mimetype": "text/x-python",
+ "name": "python",
+ "nbconvert_exporter": "python",
+ "pygments_lexer": "ipython3",
+ "version": "3.12.5"
+ }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 0
+}
\ No newline at end of file
diff --git a/_downloads/9b041eab9a2cc4c12c883027dcc37a54/dive_02_template.py b/_downloads/9b041eab9a2cc4c12c883027dcc37a54/dive_02_template.py
new file mode 100644
index 00000000..d0868c4c
--- /dev/null
+++ b/_downloads/9b041eab9a2cc4c12c883027dcc37a54/dive_02_template.py
@@ -0,0 +1,82 @@
+# Copyright Allo authors. All Rights Reserved.
+# SPDX-License-Identifier: Apache-2.0
+
+"""
+Template Kernels
+================
+
+**Author**: Hongzheng Chen (hzchen@cs.cornell.edu)
+
+This document explains how to write a template kernel in Allo.
+Template kernels are useful when we need to reuse a kernel with different data types or when certain computation patterns depend on specific constants.
+By leveraging template kernels, we can achieve greater flexibility and reusability in the code.
+"""
+
+import allo
+from allo.ir.types import int32, float32
+
+# %%
+# We follow Python's convention to use *type variable* to define a template kernel.
+# Specifically, the type variable is specified after the function name using square brackets: ``def kernel[T](...)``, and the type variable can be used in the function signature and body.
+# Importantly, as the native Python interpreter does not support Allo's type declaration (i.e., base type + shape), we need to use string annotations like ``"T[10]"`` to specify the type of the variables.
+# Otherwise, it will raise a type error.
+#
+# In the following, we define a simple addition function that adds 1 to each element of the input array.
+# To invoke the kernel with a specific data type, we can use the ``instantiate`` argument in the ``allo.customize`` function.
+
+
+def kernel[T](A: "T[10]") -> "T[10]":
+ B: T[10]
+ for i in range(10):
+ B[i] = A[i] + 1
+ return B
+
+
+s = allo.customize(kernel, instantiate=[int32])
+print(s.module)
+
+# %%
+# We can see that the kernel is specialized with the given ``int32`` data type.
+# Similarly, we can directly declare a new kernel by specifying ``float32`` as the data type.
+
+s = allo.customize(kernel, instantiate=[float32])
+print(s.module)
+
+# %%
+# If we not only want to specialize the data type but also the shape of the array, we can provide another type variable, and pass it to the ``instantiate`` argument.
+# Note that here we also use the ``: base_type`` notation to constrain the type of the type variable. Here we constrain the type variable ``M`` to be an integer.
+
+
+def kernel2[T, M: int32](A: "T[M]") -> "T[M]":
+ B: T[M]
+ for i in range(M):
+ B[i] = A[i] + 1
+ return B
+
+
+s = allo.customize(kernel2, instantiate=[int32, 20])
+print(s.module)
+
+# %%
+# Furthermore, Allo's template also enables metaprogramming that can evaluate type variables at compile time.
+# Specifically, we can use the ``allo.meta_if``, ``allo.meta_elif``, and ``allo.meta_else`` to conditionally generate code based on the type variables.
+# Just to make sure the conditions can be determined at compile time.
+
+
+def kernel3[T, M: int32](A: "T[M]") -> "T[M]":
+ B: T[M]
+ for i in range(M):
+ with allo.meta_if(T == int32):
+ B[i] = A[i] + 1
+ with allo.meta_else():
+ B[i] = A[i] - 1
+ return B
+
+
+# %%
+# In final generated code, we can see that only a single branch is generated based on the given data type.
+
+s = allo.customize(kernel3, instantiate=[int32, 20])
+print(s.module)
+s = allo.customize(kernel3, instantiate=[float32, 20])
+print(s.module)
diff --git a/_downloads/9fbd96ba55c84b58bccde28bb525c3ff/developer_02_mlir.zip b/_downloads/9fbd96ba55c84b58bccde28bb525c3ff/developer_02_mlir.zip
index b1ad6292..c7d6b5b1 100644
Binary files a/_downloads/9fbd96ba55c84b58bccde28bb525c3ff/developer_02_mlir.zip and b/_downloads/9fbd96ba55c84b58bccde28bb525c3ff/developer_02_mlir.zip differ
diff --git a/_downloads/a1303c8436389bcc90cc384bd5c2d23e/tutorial_02_vhls.zip b/_downloads/a1303c8436389bcc90cc384bd5c2d23e/tutorial_02_vhls.zip
index a6b639a4..87d20891 100644
Binary files a/_downloads/a1303c8436389bcc90cc384bd5c2d23e/tutorial_02_vhls.zip and b/_downloads/a1303c8436389bcc90cc384bd5c2d23e/tutorial_02_vhls.zip differ
diff --git a/_downloads/aac8c815d185f6d5646a9509ba2daa13/dive_03_composition.py b/_downloads/aac8c815d185f6d5646a9509ba2daa13/dive_03_composition.py
new file mode 100644
index 00000000..ef3fc175
--- /dev/null
+++ b/_downloads/aac8c815d185f6d5646a9509ba2daa13/dive_03_composition.py
@@ -0,0 +1,120 @@
+# Copyright Allo authors. All Rights Reserved.
+# SPDX-License-Identifier: Apache-2.0
+
+"""
+Kernel Composition
+==================
+
+**Author**: Hongzheng Chen (hzchen@cs.cornell.edu)
+
+This document will discuss kernel composition.
+In the previous tutorials, we have seen how to write a simple kernel.
+However, in real applications, we often need to compose multiple kernels together.
+
+In the following example, we define a ``matrix_add`` and a ``gemm`` kernel, and wrap them into a ``top``-level function.
+"""
+
+import allo
+from allo.ir.types import int32, float32
+
+M, K, N = 32, 32, 32
+
+
+def matrix_add(A: int32[M, N]) -> int32[M, N]:
+ B: int32[M, N] = 0
+ for i, j in allo.grid(M, N):
+ B[i, j] = A[i, j] + 1
+ return B
+
+
+def gemm(A: int32[M, K], B: int32[K, N]) -> int32[M, N]:
+ C: int32[M, N] = 0
+ for i, j in allo.grid(M, N):
+ for k in allo.reduction(K):
+ C[i, j] += A[i, k] * B[k, j]
+ return C
+
+
+def top(A: int32[M, K], B: int32[K, N]) -> int32[M, N]:
+ C = gemm(A, B)
+ D = matrix_add(C)
+ return D
+
+
+# %%
+# Different teams or people can then work on different parts of the code and optimize each kernel.
+# We first create a schedule for the ``matrix_add`` kernel, and add several optimizations.
+
+s1 = allo.customize(matrix_add)
+s1.pipeline("j")
+print(s1.module)
+
+# %%
+# Then we create a schedule for the ``gemm`` kernel and optimize it.
+
+s2 = allo.customize(gemm)
+s2.reorder("k", "j")
+s2.buffer_at(s2.C, axis="i")
+s2.pipeline("j")
+print(s2.module)
+
+# %%
+# Notice that now we only optimize the separate kernels but do not incorporate them into the top-level function, as shown in the following printed module.
+
+s = allo.customize(top)
+print(s.module)
+
+# %%
+# Therefore, after each part has been optimized, we need to explicitly *compose* them together.
+# In Allo, we can use the ``.compose()`` primitive to compose the schedules together into the parent function.
+
+s.compose([s1, s2])
+print(s.module)
+
+# %%
+# We can see that the schedules for the ``matrix_add`` and ``gemm`` kernels are both correctly optimized in the top-level function.
+
+##############################################################################
+# Template Composition
+# --------------------
+# Sometimes we may define template kernels and invoke the kernel with different template arguments. Allo provides an *id* option to specify the exact kernel to be composed.
+
+
+def kernel[T_in, T_out, S](A: "T_in[S]") -> "T_out[S]":
+ B: T_out[S] = 0
+ for i in range(S):
+ with allo.meta_if(T_out == int32):
+ B[i] = A[i] + 1
+ with allo.meta_else():
+ B[i] = A[i] * 2
+ return B
+
+
+def top2(A: int32[M]) -> float32[M]:
+ C = kernel[int32, int32, M, "K1"](A)
+ D = kernel[int32, float32, M, "K2"](C)
+ return D
+
+
+# %%
+# Specifically, the last argument of the template kernel is the *id* of the kernel. Later on we can use this ID for distinguishing different kernels during composition.
+# We also customize the two template kernels with different optimizations first.
+
+s1 = allo.customize(kernel, instantiate=[int32, int32, M])
+s1.unroll("i", factor=4)
+print(s1.module)
+
+s2 = allo.customize(kernel, instantiate=[int32, float32, M])
+s2.pipeline("i")
+print(s2.module)
+
+# %%
+# Finally, we compose the two template kernels into the top-level function with the ID specified.
+
+s = allo.customize(top2)
+s.compose(s1, id="K1")
+s.compose(s2, id="K2")
+print(s.module)
+
+# %%
+# We can see from the printed module that the loop in the first kernel is unrolled by a factor of 4, and the loop in the second kernel is pipelined.
diff --git a/_downloads/addf17760130f22dafec92dedc62e16a/tutorial_01_get_started.ipynb b/_downloads/addf17760130f22dafec92dedc62e16a/tutorial_01_get_started.ipynb
index d2f89745..95e32ef5 100644
--- a/_downloads/addf17760130f22dafec92dedc62e16a/tutorial_01_get_started.ipynb
+++ b/_downloads/addf17760130f22dafec92dedc62e16a/tutorial_01_get_started.ipynb
@@ -40,7 +40,7 @@
"cell_type": "markdown",
"metadata": {},
"source": [
- "We then define a function that takes two 32x32 matrices as inputs and\nreturns a 32x32 matrix as output. The variable declaration is defined\nas ``: []``. We require **strict type annotation** in\nAllo's kernels, which is different from directly programming in Python.\n\nInside the kernel, we provide a shorthand for the loop iterator. For example,\n``for i, j, k in allo.grid(32, 32, 32)`` is equivalent to the following\nnested for-loop:\n\n```python\nfor i in range(32):\n for j in range(32):\n for k in range(32):\n # body\n```\nThe ``allo.grid`` API is used to define the iteration space of the loop.\nThe arguments denote the upper bounds of the loop iterators.\nNotice the above range-loop is also supported in the new Allo, so\nusers have more flexibility to define the loop structure.\n\n"
+ "We then define a function that takes two 32x32 matrices as inputs and\nreturns a 32x32 matrix as output. The variable declaration is defined\nas ``: []``, and the function type is defined as\n``(, , ...) -> ``.\nWe require **strict type annotation** in Allo's kernels, which is different\nfrom directly programming in Python.\n\nInside the kernel, we provide a shorthand for the loop iterator. For example,\n``for i, j, k in allo.grid(32, 32, 32)`` is equivalent to the following\nnested for-loop:\n\n```python\nfor i in range(32):\n for j in range(32):\n for k in range(32):\n # body\n```\nThe ``allo.grid`` API is used to define the iteration space of the loop.\nThe arguments denote the upper bounds of the loop iterators.\nNotice the above range-loop is also supported in the new Allo, so\nusers have more flexibility to define the loop structure.\n\n"
]
},
{
diff --git a/_downloads/d58a09ade6135cf6e79cb2fe738ace28/dive_04_features.py b/_downloads/d58a09ade6135cf6e79cb2fe738ace28/dive_04_features.py
new file mode 100644
index 00000000..e087285f
--- /dev/null
+++ b/_downloads/d58a09ade6135cf6e79cb2fe738ace28/dive_04_features.py
@@ -0,0 +1,76 @@
+# Copyright Allo authors. All Rights Reserved.
+# SPDX-License-Identifier: Apache-2.0
+
+"""
+Other Features
+==============
+
+**Author**: Hongzheng Chen (hzchen@cs.cornell.edu)
+
+This document will discuss other features that are not covered in the previous tutorials.
+"""
+
+##############################################################################
+# Dynamic Shapes
+# --------------
+# In some cases, the shape of the tensor is not known at compile time, so we can use ``[...]`` to represent the dynamic shape.
+# From the generated MLIR module, we can see it has a ``"?"`` in the shape of the tensor, which means the shape is not predefined,
+# but we can still run the LLVM module with arbitrary shapes of NumPy arrays.
+
+import allo
+from allo.ir.types import int32, float32
+import numpy as np
+
+
+def kernel(A: float32[...], B: float32[...], size: int32):
+ for i in range(size):
+ B[i] = A[i]
+
+
+s = allo.customize(kernel)
+print(s.module)
+np_A = np.random.random((256,)).astype(np.float32)
+allo_A = np.zeros((256,)).astype(np.float32)
+mod = s.build()
+mod(np_A, allo_A, 256)
+np.testing.assert_allclose(np_A, allo_A)
+
+# %%
+# We can also check the generated HLS code that the arguments are declared as pointers.
+
+code = s.build(target="vhls")
+print(code)
+
+##############################################################################
+# Tuple Return
+# ------------
+# Another feature is the tuple support. As in Python, we can return multiple values from a function, Allo
+# also supports this by explicitly specifying the return type as a tuple.
+
+
+def callee(a: float32, b: float32) -> (float32, float32):
+ c: float32 = a + b
+ d: float32 = a - b
+ return c, d
+
+
+def kernel(A: float32[10], B: float32[10]) -> (float32[10], float32[10]):
+ C: float32[10] = 0
+ D: float32[10] = 0
+ for i in range(10):
+ C[i], D[i] = callee(A[i], B[i])
+ return C, D
+
+
+s = allo.customize(kernel)
+print(s.module)
+mod = s.build()
+np_A = np.random.random((10,)).astype(np.float32)
+np_B = np.random.random((10,)).astype(np.float32)
+np_C, np_D = mod(np_A, np_B)
+np_C_ref = np.zeros((10,), dtype=np.float32)
+np_D_ref = np.zeros((10,), dtype=np.float32)
+for i in range(10):
+ np_C_ref[i], np_D_ref[i] = callee(np_A[i], np_B[i])
+np.testing.assert_allclose(np_C, np_C_ref)
+np.testing.assert_allclose(np_D, np_D_ref)
diff --git a/_downloads/de72dcd3242a3c85b41c9c54a3424409/dive_02_template.ipynb b/_downloads/de72dcd3242a3c85b41c9c54a3424409/dive_02_template.ipynb
new file mode 100644
index 00000000..b910fc97
--- /dev/null
+++ b/_downloads/de72dcd3242a3c85b41c9c54a3424409/dive_02_template.ipynb
@@ -0,0 +1,133 @@
+{
+ "cells": [
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "\n# Template Kernels\n\n**Author**: Hongzheng Chen (hzchen@cs.cornell.edu)\n\nThis document explains how to write a template kernel in Allo.\nTemplate kernels are useful when we need to reuse a kernel with different data types or when certain computation patterns depend on specific constants.\nBy leveraging template kernels, we can achieve greater flexibility and reusability in the code.\n"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {
+ "collapsed": false
+ },
+ "outputs": [],
+ "source": [
+ "import allo\nfrom allo.ir.types import int32, float32"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "We follow Python's convention to use *type variable* to define a template kernel.\nSpecifically, the type variable is specified after the function name using square brackets: ``def kernel[T](...)``, and the type variable can be used in the function signature and body.\nImportantly, as the native Python interpreter does not support Allo's type declaration (i.e., base type + shape), we need to use string annotations like ``\"T[10]\"`` to specify the type of the variables.\nOtherwise, it will raise a type error.\n\nIn the following, we define a simple addition function that adds 1 to each element of the input array.\nTo invoke the kernel with a specific data type, we can use the ``instantiate`` argument in the ``allo.customize`` function.\n\n"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {
+ "collapsed": false
+ },
+ "outputs": [],
+ "source": [
+ "def kernel[T](A: \"T[10]\") -> \"T[10]\":\n B: T[10]\n for i in range(10):\n B[i] = A[i] + 1\n return B\n\n\ns = allo.customize(kernel, instantiate=[int32])\nprint(s.module)"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "We can see that the kernel is specialized with the given ``int32`` data type.\nSimilarly, we can directly declare a new kernel by specifying ``float32`` as the data type.\n\n"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {
+ "collapsed": false
+ },
+ "outputs": [],
+ "source": [
+ "s = allo.customize(kernel, instantiate=[float32])\nprint(s.module)"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "If we not only want to specialize the data type but also the shape of the array, we can provide another type variable, and pass it to the ``instantiate`` argument.\nNote that here we also use the ``: base_type`` notation to constrain the type of the type variable. Here we constrain the type variable ``M`` to be an integer.\n\n"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {
+ "collapsed": false
+ },
+ "outputs": [],
+ "source": [
+ "def kernel2[T, M: int32](A: \"T[M]\") -> \"T[M]\":\n B: T[M]\n for i in range(M):\n B[i] = A[i] + 1\n return B\n\n\ns = allo.customize(kernel2, instantiate=[int32, 20])\nprint(s.module)"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "Furthermore, Allo's template also enables metaprogramming that can evaluate type variables at compile time.\nSpecifically, we can use the ``allo.meta_if``, ``allo.meta_elif``, and ``allo.meta_else`` to conditionally generate code based on the type variables.\nJust to make sure the conditions can be determined at compile time.\n\n"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {
+ "collapsed": false
+ },
+ "outputs": [],
+ "source": [
+ "def kernel3[T, M: int32](A: \"T[M]\") -> \"T[M]\":\n B: T[M]\n for i in range(M):\n with allo.meta_if(T == int32):\n B[i] = A[i] + 1\n with allo.meta_else():\n B[i] = A[i] - 1\n return B"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "In final generated code, we can see that only a single branch is generated based on the given data type.\n\n"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {
+ "collapsed": false
+ },
+ "outputs": [],
+ "source": [
+ "s = allo.customize(kernel3, instantiate=[int32, 20])\nprint(s.module)\ns = allo.customize(kernel3, instantiate=[float32, 20])\nprint(s.module)"
+ ]
+ }
+ ],
+ "metadata": {
+ "kernelspec": {
+ "display_name": "Python 3",
+ "language": "python",
+ "name": "python3"
+ },
+ "language_info": {
+ "codemirror_mode": {
+ "name": "ipython",
+ "version": 3
+ },
+ "file_extension": ".py",
+ "mimetype": "text/x-python",
+ "name": "python",
+ "nbconvert_exporter": "python",
+ "pygments_lexer": "ipython3",
+ "version": "3.12.5"
+ }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 0
+}
\ No newline at end of file
diff --git a/_images/sphx_glr_dive_01_data_types_thumb.png b/_images/sphx_glr_dive_01_data_types_thumb.png
new file mode 100644
index 00000000..8a5fed58
Binary files /dev/null and b/_images/sphx_glr_dive_01_data_types_thumb.png differ
diff --git a/_images/sphx_glr_dive_02_template_thumb.png b/_images/sphx_glr_dive_02_template_thumb.png
new file mode 100644
index 00000000..8a5fed58
Binary files /dev/null and b/_images/sphx_glr_dive_02_template_thumb.png differ
diff --git a/_images/sphx_glr_dive_03_composition_thumb.png b/_images/sphx_glr_dive_03_composition_thumb.png
new file mode 100644
index 00000000..8a5fed58
Binary files /dev/null and b/_images/sphx_glr_dive_03_composition_thumb.png differ
diff --git a/_images/sphx_glr_dive_04_features_thumb.png b/_images/sphx_glr_dive_04_features_thumb.png
new file mode 100644
index 00000000..8a5fed58
Binary files /dev/null and b/_images/sphx_glr_dive_04_features_thumb.png differ
diff --git a/_modules/allo/customize.html b/_modules/allo/customize.html
index e034f2ee..4b0822d9 100644
--- a/_modules/allo/customize.html
+++ b/_modules/allo/customize.html
@@ -5,7 +5,7 @@
allo.customize — Allo Documentation
-
+
@@ -140,6 +140,15 @@
[docs]@wrapped_apply
- defsplit(self,axis,factor):
+ defsplit(self,axis,factor):""" `split` will find the loop with loop index `axis` and tile it with each tile size `factor` The new inner loop will be named `axis.inner` and the outer loop will be named `axis.outer`
@@ -360,7 +363,7 @@
Source code for allo.customize
[docs]@wrapped_apply
- defreorder(self,*args):
+ defreorder(self,*args):""" Reorders nested loops with indices listed in `args` such that the outermost loop is the first index listed in `args`, the second is the second outermost, and so on.
@@ -385,7 +388,7 @@
Source code for allo.customize
[docs]@wrapped_apply
- defunroll(self,axis,factor=0):
+ defunroll(self,axis,factor=0):""" Unrolls a loop with loop index `axis` by `factor`.
@@ -411,7 +414,7 @@
Source code for allo.customize
[docs]@wrapped_apply
- deffuse(self,*args):
+ deffuse(self,*args):""" Combines loops with indices listed in `args` into a single loop over a single index.
@@ -435,7 +438,7 @@
Source code for allo.customize
[docs]@wrapped_apply
- defpartition(self,target,partition_type=Partition.Complete,dim=0,factor=0):
+ defpartition(self,target,partition_type=Partition.Complete,dim=0,factor=0):""" Partitions a given array, for example if the array is `B`, this would be `<schedule>.B`. There are three types, `Partition.Complete`, `Partition.Block`, and `Partition.cyclic`.
@@ -464,14 +467,15 @@
visited_target_names=[]visited_func_calls=[]
- defrecursive_partition(inner_target):
+ defrecursive_partition(inner_target):name=f"{inner_target.func}:{inner_target.name}"ifnameinvisited_target_names:returnvisited_target_names.append(name)_,_,mlir_target=find_buffer(self.module,inner_target,self.func_args)# equivalent users
- fortensorinself.use_def_chain.get_equivalent_tensors(name):
- recursive_partition(MockBuffer(tensor.path,tensor.name))
+ ifinner_target.nameinself.func_args[inner_target.func]:
+ # is a function argument
+ idx=self.func_args[inner_target.func].index(inner_target.name)
+ name=f"{inner_target.func}:{idx}"
+ forbuf_nameinself.get_equivalent_variables(name):
+ path,buf_name=buf_name.split(":")
+ ifbuf_name.isdigit():
+ # function argument
+ buf_name=self.func_args[path][int(buf_name)]
+ recursive_partition(MockBuffer(path,buf_name))# calling the same functionifisinstance(mlir_target,func_d.CallOp):visited_func_calls.append(mlir_target)
@@ -590,7 +602,7 @@
Source code for allo.customize
[docs]@wrapped_apply
- defbuffer_at(self,target,axis):
+ defbuffer_at(self,target,axis):""" Creates a chip buffer to hold the values of `target` written to in loop with index `axis` instead of immediately writing them to memory.
@@ -617,7 +629,7 @@
Source code for allo.customize
[docs]@wrapped_apply
- defreshape(self,target,shape):
+ defreshape(self,target,shape):""" Takes an array in the kernel, `target`, for example if the array is `B`, then would be `target` would be `<schedule>.B`, and reshapes it to tuple `shape`. As an example, if the desired shape is 32 by 4 by 8, the `<shape>` would be `(32, 4, 8)`.
@@ -639,7 +651,7 @@
Source code for allo.customize
[docs]@wrapped_apply
- defpipeline(self,axis,initiation_interval=1,rewind=False):
+ defpipeline(self,axis,initiation_interval=1,rewind=False):""" Pipelines a loop with index `axis` into `initiation_interval` stages.
@@ -670,7 +682,7 @@
Source code for allo.customize
[docs]@wrapped_apply
- defparallel(self,axis):
+ defparallel(self,axis):""" Instantiates a loop with index `axis` to be computed in parallel with the loops it is nested with.
@@ -691,7 +703,7 @@
Source code for allo.customize
[docs]@wrapped_apply
- definline(self,axis=None):
+ definline(self,axis=None):""" Inlines a function `axis`.
@@ -711,7 +723,7 @@
Source code for allo.customize
[docs]@wrapped_apply
- defdataflow(self,axis):
+ defdataflow(self,axis):""" Applies a "dataflow" attribute to function `axis`. This allows for parallelism if the given function uses streams or the `to` schedule.
@@ -731,7 +743,7 @@
[docs]@wrapped_apply
- defcompute_at(self,from_loop,target_loop):
+ defcompute_at(self,from_loop,target_loop):""" If `from_loop` and `target_loop` are indices over the same range, `<schedule>.compute_at(from_loop, target_loop)` merges the two loops, taking the body of `from_loop` and appending it to the body of `target_loop`.
@@ -787,7 +799,7 @@
Source code for allo.customize
[docs]@wrapped_apply
- defreuse_at(self,target,axis):
+ defreuse_at(self,target,axis):""" Takes an array in a kernel, for example if the array is `B`, this would be `<schedule>.B`, accessed by index `axis` and creates a reuse buffer to reuse values from `target` which are accessed in a sequentially moving window.
@@ -809,7 +821,7 @@
[docs]@wrapped_apply
- defto(self,target,dst,axis=None,depth=-1):
+ defto(self,target,dst,axis=None,depth=-1):""" Takes an array in the kernel, `target`, for example if the array is `B`, this would be `target` would be `<schedule>.B`, and converts it into a stream. `dst` is the name of the array any value of `target` is written to.
@@ -871,7 +883,7 @@
Source code for allo.customize
[docs]@wrapped_apply
- defunfold(self,band_name,axes):
+ defunfold(self,band_name,axes):""" Finds a set of nested loops with name `band_name` and for every `<i>` in list `axes`. The `<i>th` nested loop is unfolded into a constant number of copies of it's loop body.
@@ -920,7 +932,7 @@
[docs]@wrapped_apply
- defcompose(self,schs:list,id=None,instantiate=None):
+ defcompose(self,schs:list,id=None,instantiate=None):""" Uses `schs`, a schedule for a kernel called in this kernel, in this kernel.
@@ -995,7 +1007,7 @@
Source code for allo.customize
This is a list of objects used to instantiate types `schs` is generic over. """
- defget_name(arg):
+ defget_name(arg):ifisinstance(arg,(LoopWrapper,MockBuffer)):arg=copy.copy(arg)orig_func_name=arg.funcifarg.funcisnotNoneelsesch.top_func_name
@@ -1073,7 +1085,14 @@
Apart from directly writing Allo kernels in Python, we also support integrating existing C++ HLS kernels into Allo. This feature is useful when you have a existing optimized C++ HLS code that wants to be integrated into Allo. The following example shows how to integrate a simple vector addition kernel written in C++ into Allo.
+
Suppose the C++ kernel header is defined in the vadd.h file:
In Allo, we can create an IP module to wrap the C++ kernel. Basically, we need to provide the top-level function name, the header files, and the implementation files. Also, currently an Allo signature is required to specify the input and output types of the kernel. Allo will automatically compile the C++ kernel and generate the corresponding Python wrapper based on the provided files and signature. The last argument link_hls determines whether the C++ compiler should link the Vitis HLS libraries (e.g., ap_int), which is only available when your machine has installed Vitis HLS.
After creating the IP module, we can use it in Allo as a normal Python function. For example, we can directly call the vadd function to perform vector addition. The inputs and outputs will be automatically wrapped and unwrapped as NumPy arrays, which greatly simplies the burden of complex C-Python interface management. This is also very useful when you want to debug the HLS kernels with the Python data.
Moreover, the IP module can also be called in a normal Allo kernel. In the following example, we wrap the vadd function into an Allo kernel and use it to perform vector addition. The Allo kernel can then be further customized and compiled with the external C++ HLS kernel.
The process should be very similar to the original Allo workflow.
+The default target is LLVM. We can also change the backend to other compilers such as Vitis HLS by specifying the target:
This guide will walk you through the process of translating a Python-based
Allo program to the internal MLIR representation. We will use the vector
addition example to demonstrate the process.