-
Notifications
You must be signed in to change notification settings - Fork 31
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Deploying to gh-pages from @ 9162879 🚀
- Loading branch information
Showing
60 changed files
with
5,243 additions
and
231 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,4 +1,4 @@ | ||
# Sphinx build info version 1 | ||
# This file records the configuration used when building these files. When it is not found, a full rebuild will be done. | ||
config: c9e032fa7f49dc809f164063dfb2e28e | ||
config: dabe4189cc532017a30e750cb5d38b1e | ||
tags: 645f666f9bcd5a90fca523b33c5a78b7 |
Binary file modified
BIN
+0 Bytes
(100%)
_downloads/2b236384a146de2d4a64081ddb0a7c9a/developer_01_ir_builder.zip
Binary file not shown.
183 changes: 183 additions & 0 deletions
183
_downloads/39c6904b3f007c07e3d59200d0bf98b4/dive_03_composition.ipynb
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,183 @@ | ||
{ | ||
"cells": [ | ||
{ | ||
"cell_type": "markdown", | ||
"metadata": {}, | ||
"source": [ | ||
"\n# Kernel Composition\n\n**Author**: Hongzheng Chen ([email protected])\n\nThis document will discuss kernel composition.\nIn the previous tutorials, we have seen how to write a simple kernel.\nHowever, in real applications, we often need to compose multiple kernels together.\n\nIn the following example, we define a ``matrix_add`` and a ``gemm`` kernel, and wrap them into a ``top``-level function.\n" | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": null, | ||
"metadata": { | ||
"collapsed": false | ||
}, | ||
"outputs": [], | ||
"source": [ | ||
"import allo\nfrom allo.ir.types import int32, float32\n\nM, K, N = 32, 32, 32\n\n\ndef matrix_add(A: int32[M, N]) -> int32[M, N]:\n B: int32[M, N] = 0\n for i, j in allo.grid(M, N):\n B[i, j] = A[i, j] + 1\n return B\n\n\ndef gemm(A: int32[M, K], B: int32[K, N]) -> int32[M, N]:\n C: int32[M, N] = 0\n for i, j in allo.grid(M, N):\n for k in allo.reduction(K):\n C[i, j] += A[i, k] * B[k, j]\n return C\n\n\ndef top(A: int32[M, K], B: int32[K, N]) -> int32[M, N]:\n C = gemm(A, B)\n D = matrix_add(C)\n return D" | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"metadata": {}, | ||
"source": [ | ||
"Different teams or people can then work on different parts of the code and optimize each kernel.\nWe first create a schedule for the ``matrix_add`` kernel, and add several optimizations.\n\n" | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": null, | ||
"metadata": { | ||
"collapsed": false | ||
}, | ||
"outputs": [], | ||
"source": [ | ||
"s1 = allo.customize(matrix_add)\ns1.pipeline(\"j\")\nprint(s1.module)" | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"metadata": {}, | ||
"source": [ | ||
"Then we create a schedule for the ``gemm`` kernel and optimize it.\n\n" | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": null, | ||
"metadata": { | ||
"collapsed": false | ||
}, | ||
"outputs": [], | ||
"source": [ | ||
"s2 = allo.customize(gemm)\ns2.reorder(\"k\", \"j\")\ns2.buffer_at(s2.C, axis=\"i\")\ns2.pipeline(\"j\")\nprint(s2.module)" | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"metadata": {}, | ||
"source": [ | ||
"Notice that now we only optimize the separate kernels but do not incorporate them into the top-level function, as shown in the following printed module.\n\n" | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": null, | ||
"metadata": { | ||
"collapsed": false | ||
}, | ||
"outputs": [], | ||
"source": [ | ||
"s = allo.customize(top)\nprint(s.module)" | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"metadata": {}, | ||
"source": [ | ||
"Therefore, after each part has been optimized, we need to explicitly *compose* them together.\nIn Allo, we can use the ``.compose()`` primitive to compose the schedules together into the parent function.\n\n" | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": null, | ||
"metadata": { | ||
"collapsed": false | ||
}, | ||
"outputs": [], | ||
"source": [ | ||
"s.compose([s1, s2])\nprint(s.module)" | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"metadata": {}, | ||
"source": [ | ||
"We can see that the schedules for the ``matrix_add`` and ``gemm`` kernels are both correctly optimized in the top-level function.\n\n" | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"metadata": {}, | ||
"source": [ | ||
"## Template Composition\nSometimes we may define template kernels and invoke the kernel with different template arguments. Allo provides an *id* option to specify the exact kernel to be composed.\n\n" | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": null, | ||
"metadata": { | ||
"collapsed": false | ||
}, | ||
"outputs": [], | ||
"source": [ | ||
"def kernel[T_in, T_out, S](A: \"T_in[S]\") -> \"T_out[S]\":\n B: T_out[S] = 0\n for i in range(S):\n with allo.meta_if(T_out == int32):\n B[i] = A[i] + 1\n with allo.meta_else():\n B[i] = A[i] * 2\n return B\n\n\ndef top2(A: int32[M]) -> float32[M]:\n C = kernel[int32, int32, M, \"K1\"](A)\n D = kernel[int32, float32, M, \"K2\"](C)\n return D" | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"metadata": {}, | ||
"source": [ | ||
"Specifically, the last argument of the template kernel is the *id* of the kernel. Later on we can use this ID for distinguishing different kernels during composition.\nWe also customize the two template kernels with different optimizations first.\n\n" | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": null, | ||
"metadata": { | ||
"collapsed": false | ||
}, | ||
"outputs": [], | ||
"source": [ | ||
"s1 = allo.customize(kernel, instantiate=[int32, int32, M])\ns1.unroll(\"i\", factor=4)\nprint(s1.module)\n\ns2 = allo.customize(kernel, instantiate=[int32, float32, M])\ns2.pipeline(\"i\")\nprint(s2.module)" | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"metadata": {}, | ||
"source": [ | ||
"Finally, we compose the two template kernels into the top-level function with the ID specified.\n\n" | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": null, | ||
"metadata": { | ||
"collapsed": false | ||
}, | ||
"outputs": [], | ||
"source": [ | ||
"s = allo.customize(top2)\ns.compose(s1, id=\"K1\")\ns.compose(s2, id=\"K2\")\nprint(s.module)" | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"metadata": {}, | ||
"source": [ | ||
"We can see from the printed module that the loop in the first kernel is unrolled by a factor of 4, and the loop in the second kernel is pipelined.\n\n" | ||
] | ||
} | ||
], | ||
"metadata": { | ||
"kernelspec": { | ||
"display_name": "Python 3", | ||
"language": "python", | ||
"name": "python3" | ||
}, | ||
"language_info": { | ||
"codemirror_mode": { | ||
"name": "ipython", | ||
"version": 3 | ||
}, | ||
"file_extension": ".py", | ||
"mimetype": "text/x-python", | ||
"name": "python", | ||
"nbconvert_exporter": "python", | ||
"pygments_lexer": "ipython3", | ||
"version": "3.12.5" | ||
} | ||
}, | ||
"nbformat": 4, | ||
"nbformat_minor": 0 | ||
} |
Binary file not shown.
Binary file modified
BIN
+176 Bytes
(100%)
_downloads/48b69635df4cfe1643d9d6b9bdf6cd79/tutorial_01_get_started.zip
Binary file not shown.
114 changes: 114 additions & 0 deletions
114
_downloads/4fba383e419c1fc1ea22179140eb2d12/dive_01_data_types.py
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,114 @@ | ||
# Copyright Allo authors. All Rights Reserved. | ||
# SPDX-License-Identifier: Apache-2.0 | ||
|
||
""" | ||
Data Types and Type Casting | ||
=========================== | ||
**Author**: Hongzheng Chen ([email protected]) | ||
This document will discuss the Allo-supported data types in detail. | ||
All the data types are defined in the ``allo.ir.types`` module. | ||
""" | ||
|
||
import allo | ||
from allo.ir.types import int16, int32, float32, Int, UInt, Float, Fixed | ||
|
||
############################################################################## | ||
# Currently, Allo supports three base data types for mathematical operations: | ||
# | ||
# - Integers: ``Int(bitwdith)``, ``UInt(bitwidth)`` | ||
# - Floating points: ``Float(bitwidth)`` (only support 16, 32, and 64 bits) | ||
# - Fixed points: ``Fixed(bitwidth, frac)``, ``UFixed(bitwidth, frac)`` | ||
# | ||
# For example, one can declare a 15-bit integer as ``Int(15)`` and an unsigned 8-bit fixed-point number with 3 fractional bits as ``UFixed(8, 3)``. | ||
# For all the C/C++ supported data types, we provide shorthands like ``float32`` and ``int16`` to easily declare them. | ||
|
||
# %% | ||
# Notice different from native Python, Allo requires the program to be **strongly and statically typed**. | ||
# The variable types are either declared explicitly or inferred from the context. | ||
# For a variable that first appears in the program, we should declare it with an expected data type using Python's type hint notation: | ||
|
||
a: int32 | ||
|
||
# %% | ||
# Once the data types are defined, an important consideration is how to handle | ||
# operations between variables of different types. Allo supports two types of casting: | ||
# (1) implicit casting that is automatically done by the Allo compiler; | ||
# and (2) explicit casting that is manually done by the user. | ||
|
||
############################################################################## | ||
# Implicit Casting | ||
# ---------------- | ||
# Allo has a strong type system that follows the `MLIR convention <https://mlir.llvm.org/docs/Dialects/ArithOps/>`_ to enforce the operand types are the same for the arithmetic operations. | ||
# However, it is burdensome for users to cast the variables every time, and it is also error-prone to avoid overflow when performing computations. | ||
# Therefore, Allo is equipped with builtin casting rules to automatically cast the variables to the same type before the operation, which is called *implicit casting*. | ||
# An example is shown below: | ||
|
||
|
||
def add(a: int32, b: int32) -> int32: | ||
return a + b | ||
|
||
|
||
s = allo.customize(add) | ||
print(s.module) | ||
|
||
# %% | ||
# We can see that ``a`` and ``b`` are firstly casted to ``int33``, added | ||
# together, and converted back to ``int32``. | ||
# This is to avoid overflow and is automatically inferred by the Allo compiler. | ||
|
||
|
||
############################################################################## | ||
# Explicit Casting | ||
# ---------------- | ||
# One can also explicitly cast the variable to a specific type by creating an intermediate variable, | ||
# or use Python-builtin functions like ``float()`` and ``int()`` to explicitly cast a variable to ``float32`` or ``int32``. | ||
# Another example is shown below: | ||
|
||
|
||
def cast(a: int32) -> int16: | ||
b: float32 = a # explicit | ||
c: float32 = b * 2 | ||
d: float32 = float(a) * 2 | ||
e: int16 = c + d | ||
return e | ||
|
||
|
||
s = allo.customize(cast) | ||
print(s.module) | ||
|
||
# %% | ||
# By explicitly creating an intermediate variable ``b``, we can cast the ``int32`` variable ``a`` to the desired floating-point type. | ||
# Similarly, calling ``float(a)`` can also cast ``a`` to a floating-point type. | ||
# | ||
# .. note:: | ||
# | ||
# The above stated explicit casting between integers and floating points preserves the value but the precision may be changed. | ||
# If you want to use a union type to represent both integers and floating points, please use the `.bitcast()` API instead. For example, ``a.bitcast()`` can convert ``int32`` to ``float32`` representation with the bit pattern preserved. | ||
|
||
############################################################################## | ||
# Bit Operations | ||
# -------------- | ||
# As hardware accelerators have ability to manipulate each bit of the data, Allo supports bit operations on | ||
# those integer types. For example, we can access a specific bit in an integer ``a`` using the indexing operator: | ||
# | ||
# .. code-block:: python | ||
# | ||
# a[15] | ||
|
||
# %% | ||
# We can also extract a chunk of bits from an integer using the slicing operator: | ||
# | ||
# .. code-block:: python | ||
# | ||
# a[0:16] | ||
# | ||
# .. note:: | ||
# | ||
# Allo follows the Python convention that the upper bound is not included, so ``[0:16]`` means | ||
# extracting the first 16 bits, which is different from the Xilinx HLS convention that uses ``[0:15]`` | ||
# to indicate the first 16 bits. | ||
|
||
# %% | ||
# Not only constant values are supported, but also variables can be used as the index or the slice range. |
Binary file not shown.
Oops, something went wrong.