Deploying to gh-pages from @ 9162879 🚀

cornell-zhang · Jan 13, 2025 · 4045774 · 4045774
1 parent 3b78bfa
commit 4045774
Show file tree

Hide file tree

Showing 60 changed files with 5,243 additions and 231 deletions.
diff --git a/.buildinfo b/.buildinfo
@@ -1,4 +1,4 @@
 # Sphinx build info version 1
 # This file records the configuration used when building these files. When it is not found, a full rebuild will be done.
-config: c9e032fa7f49dc809f164063dfb2e28e
+config: dabe4189cc532017a30e750cb5d38b1e
 tags: 645f666f9bcd5a90fca523b33c5a78b7
diff --git a/_downloads/2b236384a146de2d4a64081ddb0a7c9a/developer_01_ir_builder.zip b/_downloads/2b236384a146de2d4a64081ddb0a7c9a/developer_01_ir_builder.zip
diff --git a/_downloads/39c6904b3f007c07e3d59200d0bf98b4/dive_03_composition.ipynb b/_downloads/39c6904b3f007c07e3d59200d0bf98b4/dive_03_composition.ipynb
@@ -0,0 +1,183 @@
+{
+  "cells": [
+    {
+      "cell_type": "markdown",
+      "metadata": {},
+      "source": [
+        "\n# Kernel Composition\n\n**Author**: Hongzheng Chen ([email protected])\n\nThis document will discuss kernel composition.\nIn the previous tutorials, we have seen how to write a simple kernel.\nHowever, in real applications, we often need to compose multiple kernels together.\n\nIn the following example, we define a ``matrix_add`` and a ``gemm`` kernel, and wrap them into a ``top``-level function.\n"
+      ]
+    },
+    {
+      "cell_type": "code",
+      "execution_count": null,
+      "metadata": {
+        "collapsed": false
+      },
+      "outputs": [],
+      "source": [
+        "import allo\nfrom allo.ir.types import int32, float32\n\nM, K, N = 32, 32, 32\n\n\ndef matrix_add(A: int32[M, N]) -> int32[M, N]:\n    B: int32[M, N] = 0\n    for i, j in allo.grid(M, N):\n        B[i, j] = A[i, j] + 1\n    return B\n\n\ndef gemm(A: int32[M, K], B: int32[K, N]) -> int32[M, N]:\n    C: int32[M, N] = 0\n    for i, j in allo.grid(M, N):\n        for k in allo.reduction(K):\n            C[i, j] += A[i, k] * B[k, j]\n    return C\n\n\ndef top(A: int32[M, K], B: int32[K, N]) -> int32[M, N]:\n    C = gemm(A, B)\n    D = matrix_add(C)\n    return D"
+      ]
+    },
+    {
+      "cell_type": "markdown",
+      "metadata": {},
+      "source": [
+        "Different teams or people can then work on different parts of the code and optimize each kernel.\nWe first create a schedule for the ``matrix_add`` kernel, and add several optimizations.\n\n"
+      ]
+    },
+    {
+      "cell_type": "code",
+      "execution_count": null,
+      "metadata": {
+        "collapsed": false
+      },
+      "outputs": [],
+      "source": [
+        "s1 = allo.customize(matrix_add)\ns1.pipeline(\"j\")\nprint(s1.module)"
+      ]
+    },
+    {
+      "cell_type": "markdown",
+      "metadata": {},
+      "source": [
+        "Then we create a schedule for the ``gemm`` kernel and optimize it.\n\n"
+      ]
+    },
+    {
+      "cell_type": "code",
+      "execution_count": null,
+      "metadata": {
+        "collapsed": false
+      },
+      "outputs": [],
+      "source": [
+        "s2 = allo.customize(gemm)\ns2.reorder(\"k\", \"j\")\ns2.buffer_at(s2.C, axis=\"i\")\ns2.pipeline(\"j\")\nprint(s2.module)"
+      ]
+    },
+    {
+      "cell_type": "markdown",
+      "metadata": {},
+      "source": [
+        "Notice that now we only optimize the separate kernels but do not incorporate them into the top-level function, as shown in the following printed module.\n\n"
+      ]
+    },
+    {
+      "cell_type": "code",
+      "execution_count": null,
+      "metadata": {
+        "collapsed": false
+      },
+      "outputs": [],
+      "source": [
+        "s = allo.customize(top)\nprint(s.module)"
+      ]
+    },
+    {
+      "cell_type": "markdown",
+      "metadata": {},
+      "source": [
+        "Therefore, after each part has been optimized, we need to explicitly *compose* them together.\nIn Allo, we can use the ``.compose()`` primitive to compose the schedules together into the parent function.\n\n"
+      ]
+    },
+    {
+      "cell_type": "code",
+      "execution_count": null,
+      "metadata": {
+        "collapsed": false
+      },
+      "outputs": [],
+      "source": [
+        "s.compose([s1, s2])\nprint(s.module)"
+      ]
+    },
+    {
+      "cell_type": "markdown",
+      "metadata": {},
+      "source": [
+        "We can see that the schedules for the ``matrix_add`` and ``gemm`` kernels are both correctly optimized in the top-level function.\n\n"
+      ]
+    },
+    {
+      "cell_type": "markdown",
+      "metadata": {},
+      "source": [
+        "## Template Composition\nSometimes we may define template kernels and invoke the kernel with different template arguments. Allo provides an *id* option to specify the exact kernel to be composed.\n\n"
+      ]
+    },
+    {
+      "cell_type": "code",
+      "execution_count": null,
+      "metadata": {
+        "collapsed": false
+      },
+      "outputs": [],
+      "source": [
+        "def kernel[T_in, T_out, S](A: \"T_in[S]\") -> \"T_out[S]\":\n    B: T_out[S] = 0\n    for i in range(S):\n        with allo.meta_if(T_out == int32):\n            B[i] = A[i] + 1\n        with allo.meta_else():\n            B[i] = A[i] * 2\n    return B\n\n\ndef top2(A: int32[M]) -> float32[M]:\n    C = kernel[int32, int32, M, \"K1\"](A)\n    D = kernel[int32, float32, M, \"K2\"](C)\n    return D"
+      ]
+    },
+    {
+      "cell_type": "markdown",
+      "metadata": {},
+      "source": [
+        "Specifically, the last argument of the template kernel is the *id* of the kernel. Later on we can use this ID for distinguishing different kernels during composition.\nWe also customize the two template kernels with different optimizations first.\n\n"
+      ]
+    },
+    {
+      "cell_type": "code",
+      "execution_count": null,
+      "metadata": {
+        "collapsed": false
+      },
+      "outputs": [],
+      "source": [
+        "s1 = allo.customize(kernel, instantiate=[int32, int32, M])\ns1.unroll(\"i\", factor=4)\nprint(s1.module)\n\ns2 = allo.customize(kernel, instantiate=[int32, float32, M])\ns2.pipeline(\"i\")\nprint(s2.module)"
+      ]
+    },
+    {
+      "cell_type": "markdown",
+      "metadata": {},
+      "source": [
+        "Finally, we compose the two template kernels into the top-level function with the ID specified.\n\n"
+      ]
+    },
+    {
+      "cell_type": "code",
+      "execution_count": null,
+      "metadata": {
+        "collapsed": false
+      },
+      "outputs": [],
+      "source": [
+        "s = allo.customize(top2)\ns.compose(s1, id=\"K1\")\ns.compose(s2, id=\"K2\")\nprint(s.module)"
+      ]
+    },
+    {
+      "cell_type": "markdown",
+      "metadata": {},
+      "source": [
+        "We can see from the printed module that the loop in the first kernel is unrolled by a factor of 4, and the loop in the second kernel is pipelined.\n\n"
+      ]
+    }
+  ],
+  "metadata": {
+    "kernelspec": {
+      "display_name": "Python 3",
+      "language": "python",
+      "name": "python3"
+    },
+    "language_info": {
+      "codemirror_mode": {
+        "name": "ipython",
+        "version": 3
+      },
+      "file_extension": ".py",
+      "mimetype": "text/x-python",
+      "name": "python",
+      "nbconvert_exporter": "python",
+      "pygments_lexer": "ipython3",
+      "version": "3.12.5"
+    }
+  },
+  "nbformat": 4,
+  "nbformat_minor": 0
+}
diff --git a/_downloads/4569d89feac47262c8c4e3e128da7a7e/dive_04_features.zip b/_downloads/4569d89feac47262c8c4e3e128da7a7e/dive_04_features.zip
diff --git a/_downloads/48b69635df4cfe1643d9d6b9bdf6cd79/tutorial_01_get_started.zip b/_downloads/48b69635df4cfe1643d9d6b9bdf6cd79/tutorial_01_get_started.zip
diff --git a/_downloads/4fba383e419c1fc1ea22179140eb2d12/dive_01_data_types.py b/_downloads/4fba383e419c1fc1ea22179140eb2d12/dive_01_data_types.py
@@ -0,0 +1,114 @@
+# Copyright Allo authors. All Rights Reserved.
+# SPDX-License-Identifier: Apache-2.0
+
+"""
+Data Types and Type Casting
+===========================
+
+**Author**: Hongzheng Chen ([email protected])
+
+This document will discuss the Allo-supported data types in detail.
+All the data types are defined in the ``allo.ir.types`` module.
+"""
+
+import allo
+from allo.ir.types import int16, int32, float32, Int, UInt, Float, Fixed
+
+##############################################################################
+# Currently, Allo supports three base data types for mathematical operations:
+#
+# - Integers: ``Int(bitwdith)``, ``UInt(bitwidth)``
+# - Floating points: ``Float(bitwidth)`` (only support 16, 32, and 64 bits)
+# - Fixed points: ``Fixed(bitwidth, frac)``, ``UFixed(bitwidth, frac)``
+#
+# For example, one can declare a 15-bit integer as ``Int(15)`` and an unsigned 8-bit fixed-point number with 3 fractional bits as ``UFixed(8, 3)``.
+# For all the C/C++ supported data types, we provide shorthands like ``float32`` and ``int16`` to easily declare them.
+
+# %%
+# Notice different from native Python, Allo requires the program to be **strongly and statically typed**.
+# The variable types are either declared explicitly or inferred from the context.
+# For a variable that first appears in the program, we should declare it with an expected data type using Python's type hint notation:
+
+a: int32
+
+# %%
+# Once the data types are defined, an important consideration is how to handle
+# operations between variables of different types. Allo supports two types of casting:
+# (1) implicit casting that is automatically done by the Allo compiler;
+# and (2) explicit casting that is manually done by the user.
+
+##############################################################################
+# Implicit Casting
+# ----------------
+# Allo has a strong type system that follows the `MLIR convention <https://mlir.llvm.org/docs/Dialects/ArithOps/>`_ to enforce the operand types are the same for the arithmetic operations.
+# However, it is burdensome for users to cast the variables every time, and it is also error-prone to avoid overflow when performing computations.
+# Therefore, Allo is equipped with builtin casting rules to automatically cast the variables to the same type before the operation, which is called *implicit casting*.
+# An example is shown below:
+
+
+def add(a: int32, b: int32) -> int32:
+    return a + b
+
+
+s = allo.customize(add)
+print(s.module)
+
+# %%
+# We can see that ``a`` and ``b`` are firstly casted to ``int33``, added
+# together, and converted back to ``int32``.
+# This is to avoid overflow and is automatically inferred by the Allo compiler.
+
+
+##############################################################################
+# Explicit Casting
+# ----------------
+# One can also explicitly cast the variable to a specific type by creating an intermediate variable,
+# or use Python-builtin functions like ``float()`` and ``int()`` to explicitly cast a variable to ``float32`` or ``int32``.
+# Another example is shown below:
+
+
+def cast(a: int32) -> int16:
+    b: float32 = a  # explicit
+    c: float32 = b * 2
+    d: float32 = float(a) * 2
+    e: int16 = c + d
+    return e
+
+
+s = allo.customize(cast)
+print(s.module)
+
+# %%
+# By explicitly creating an intermediate variable ``b``, we can cast the ``int32`` variable ``a`` to the desired floating-point type.
+# Similarly, calling ``float(a)`` can also cast ``a`` to a floating-point type.
+#
+# .. note::
+#
+#    The above stated explicit casting between integers and floating points preserves the value but the precision may be changed.
+#    If you want to use a union type to represent both integers and floating points, please use the `.bitcast()` API instead. For example, ``a.bitcast()`` can convert ``int32`` to ``float32`` representation with the bit pattern preserved.
+
+##############################################################################
+# Bit Operations
+# --------------
+# As hardware accelerators have ability to manipulate each bit of the data, Allo supports bit operations on
+# those integer types. For example, we can access a specific bit in an integer ``a`` using the indexing operator:
+#
+# .. code-block:: python
+#
+#   a[15]
+
+# %%
+# We can also extract a chunk of bits from an integer using the slicing operator:
+#
+# .. code-block:: python
+#
+#   a[0:16]
+#
+# .. note::
+#
+#    Allo follows the Python convention that the upper bound is not included, so ``[0:16]`` means
+#    extracting the first 16 bits, which is different from the Xilinx HLS convention that uses ``[0:15]``
+#    to indicate the first 16 bits.
+
+# %%
+# Not only constant values are supported, but also variables can be used as the index or the slice range.
diff --git a/_downloads/5c3db288c9103701a8cc33d4c4f30066/dive_03_composition.zip b/_downloads/5c3db288c9103701a8cc33d4c4f30066/dive_03_composition.zip