Fixed build and test workflow for intel self-hosted runner #17

gshimansky · 2024-06-08T02:04:47Z

This workflow now is able to execute tests, some of them still fail but many tests pass.

Signed-off-by: Gregory Shimansky <[email protected]>

…ng#17) * Fixed yaml syntax Signed-off-by: Gregory Shimansky <[email protected]> * Removed cpu label from run-on Signed-off-by: Gregory Shimansky <[email protected]> * Added missing zlib-dev Signed-off-by: Gregory Shimansky <[email protected]> * Added missing apt-get update Signed-off-by: Gregory Shimansky <[email protected]> * Remove pip cache because on self-hosted runner it slows things down Signed-off-by: Gregory Shimansky <[email protected]> * Corrected path to tests Signed-off-by: Gregory Shimansky <[email protected]> * Added installation of torch==2.1.2 Signed-off-by: Gregory Shimansky <[email protected]> --------- Signed-off-by: Gregory Shimansky <[email protected]>

When running [convert_blocked1d_to_slice0](https://github.com/triton-lang/triton/blob/0ba5f0c3cd029d5c3d1f01b9bf29dac32c27345e/test/Conversion/tritongpu_to_llvm.mlir#L924) Triton ends up computing a rank of a matrix with 0 columns during linear layout lowering, which trips up f2reduce, and causes undefined behavior, detectable through [UBSAN](https://clang.llvm.org/docs/UndefinedBehaviorSanitizer.html). Fix this by returning the rank (0) early in these cases, without calling f2reduce. <details><summary>Stack trace</summary> <p> ``` third_party/triton/third_party/f2reduce/f2reduce.cpp:421:30: runtime error: shift exponent 18446744073709551615 is too large for 64-bit type 'unsigned long long' #0 0x556ee2fea3be in inplace_rref_small third_party/triton/third_party/f2reduce/f2reduce.cpp:421:30 #1 0x556ee2fea3be in f2reduce::inplace_rref_strided(unsigned long*, unsigned long, unsigned long, unsigned long) third_party/triton/third_party/f2reduce/f2reduce.cpp:470:9 #2 0x556ee2ea70da in getMatrixRank third_party/triton/lib/Tools/LinearLayout.cpp:125:3 #3 0x556ee2ea70da in mlir::triton::LinearLayout::checkInvariants(bool) third_party/triton/lib/Tools/LinearLayout.cpp:299:7 #4 0x556ee2ea656d in mlir::triton::LinearLayout::tryCreate(llvm::MapVector<mlir::StringAttr, std::__u::vector<std::__u::vector<int, std::__u::allocator<int>>, std::__u::allocator<std::__u::vector<int, std::__u::allocator<int>>>>, llvm::DenseMap<mlir::StringAttr, unsigned int, llvm::DenseMapInfo<mlir::StringAttr, void>, llvm::detail::DenseMapPair<mlir::StringAttr, unsigned int>>, llvm::SmallVector<std::__u::pair<mlir::StringAttr, std::__u::vector<std::__u::vector<int, std::__u::allocator<int>>, std::__u::allocator<std::__u::vector<int, std::__u::allocator<int>>>>>, 0u>>, llvm::ArrayRef<std::__u::pair<mlir::StringAttr, int>>, bool) third_party/triton/lib/Tools/LinearLayout.cpp:190:41 #5 0x556ee2eb2150 in mlir::triton::LinearLayout::divideRight(mlir::triton::LinearLayout const&) third_party/triton/lib/Tools/LinearLayout.cpp:654:51 #6 0x556ee2ee1c39 in mlir::cvtNeedsSharedMemory(mlir::RankedTensorType, mlir::RankedTensorType) third_party/triton/lib/Analysis/Utility.cpp:652:14 #7 0x556ee2cf38fd in mlir::triton::getRepShapeForCvtLayout(mlir::triton::gpu::ConvertLayoutOp) third_party/triton/lib/Analysis/Allocation.cpp:66:8 #8 0x556ee2cf3efa in mlir::triton::getScratchConfigForCvtLayout(mlir::triton::gpu::ConvertLayoutOp, unsigned int&, unsigned int&) third_party/triton/lib/Analysis/Allocation.cpp:95:19 #9 0x556ee2cf6057 in mlir::triton::AllocationAnalysis::getScratchValueSize(mlir::Operation*) third_party/triton/lib/Analysis/Allocation.cpp:272:24 #10 0x556ee2cf5499 in operator() third_party/triton/lib/Analysis/Allocation.cpp:343:7 #11 0x556ee2cf5499 in void llvm::function_ref<void (mlir::Operation*)>::callback_fn<mlir::triton::AllocationAnalysis::getValuesAndSizes()::'lambda'(mlir::Operation*)>(long, mlir::Operation*) third_party/llvm/llvm-project/llvm/include/llvm/ADT/STLFunctionalExtras.h:45:12 #12 0x556edeeee7a9 in operator() third_party/llvm/llvm-project/llvm/include/llvm/ADT/STLFunctionalExtras.h:68:12 #13 0x556edeeee7a9 in void mlir::detail::walk<mlir::ForwardIterator>(mlir::Operation*, llvm::function_ref<void (mlir::Operation*)>, mlir::WalkOrder) third_party/llvm/llvm-project/mlir/include/mlir/IR/Visitors.h:174:5 #14 0x556edeeee87c in void mlir::detail::walk<mlir::ForwardIterator>(mlir::Operation*, llvm::function_ref<void (mlir::Operation*)>, mlir::WalkOrder) third_party/llvm/llvm-project/mlir/include/mlir/IR/Visitors.h:182:9 #15 0x556ee2cf49e7 in walk<(mlir::WalkOrder)0, mlir::ForwardIterator, (lambda at third_party/triton/lib/Analysis/Allocation.cpp:341:42), mlir::Operation *, void> third_party/llvm/llvm-project/mlir/include/mlir/IR/Visitors.h:313:10 #16 0x556ee2cf49e7 in walk<(mlir::WalkOrder)0, mlir::ForwardIterator, (lambda at third_party/triton/lib/Analysis/Allocation.cpp:341:42), void> third_party/llvm/llvm-project/mlir/include/mlir/IR/Operation.h:794:12 #17 0x556ee2cf49e7 in mlir::triton::AllocationAnalysis::getValuesAndSizes() third_party/triton/lib/Analysis/Allocation.cpp:341:16 #18 0x556ee2cf4852 in run third_party/triton/lib/Analysis/Allocation.cpp:182:5 #19 0x556ee2cf4852 in AllocationAnalysis third_party/triton/lib/Analysis/Allocation.cpp:169:5 #20 0x556ee2cf4852 in mlir::Allocation::run(llvm::DenseMap<mlir::FunctionOpInterface, mlir::Allocation, llvm::DenseMapInfo<mlir::FunctionOpInterface, void>, llvm::detail::DenseMapPair<mlir::FunctionOpInterface, mlir::Allocation>>&) third_party/triton/lib/Analysis/Allocation.cpp:627:3 #21 0x556ee1677402 in operator() third_party/triton/include/triton/Analysis/Allocation.h:227:26 #22 0x556ee1677402 in void mlir::CallGraph<mlir::Allocation>::doWalk<(mlir::WalkOrder)0, (mlir::WalkOrder)1, mlir::ModuleAllocation::ModuleAllocation(mlir::ModuleOp)::'lambda'(mlir::CallOpInterface, mlir::FunctionOpInterface), mlir::ModuleAllocation::ModuleAllocation(mlir::ModuleOp)::'lambda'(mlir::FunctionOpInterface)>(mlir::FunctionOpInterface, llvm::DenseSet<mlir::FunctionOpInterface, llvm::DenseMapInfo<mlir::FunctionOpInterface, void>>&, mlir::ModuleAllocation::ModuleAllocation(mlir::ModuleOp)::'lambda'(mlir::CallOpInterface, mlir::FunctionOpInterface), mlir::ModuleAllocation::ModuleAllocation(mlir::ModuleOp)::'lambda'(mlir::FunctionOpInterface)) third_party/triton/include/triton/Analysis/Utility.h:350:7 #23 0x556ee16756b3 in walk<(mlir::WalkOrder)0, (mlir::WalkOrder)1, (lambda at third_party/triton/include/triton/Analysis/Allocation.h:222:9), (lambda at third_party/triton/include/triton/Analysis/Allocation.h:224:9)> third_party/triton/include/triton/Analysis/Utility.h:242:7 #24 0x556ee16756b3 in mlir::ModuleAllocation::ModuleAllocation(mlir::ModuleOp) third_party/triton/include/triton/Analysis/Allocation.h:220:5 #25 0x556ee2c2bf18 in (anonymous namespace)::AllocateSharedMemory::runOnOperation() third_party/triton/lib/Conversion/TritonGPUToLLVM/AllocateSharedMemory.cpp:26:22 ... UndefinedBehaviorSanitizer: invalid-shift-exponent third_party/triton/third_party/f2reduce/f2reduce.cpp:421:30 ``` </p> </details>

* Fixed yaml syntax Signed-off-by: Gregory Shimansky <[email protected]> * Removed cpu label from run-on Signed-off-by: Gregory Shimansky <[email protected]> * Added missing zlib-dev Signed-off-by: Gregory Shimansky <[email protected]> * Added missing apt-get update Signed-off-by: Gregory Shimansky <[email protected]> * Remove pip cache because on self-hosted runner it slows things down Signed-off-by: Gregory Shimansky <[email protected]> * Corrected path to tests Signed-off-by: Gregory Shimansky <[email protected]> * Added installation of torch==2.1.2 Signed-off-by: Gregory Shimansky <[email protected]> --------- Signed-off-by: Gregory Shimansky <[email protected]>

…ng#17) * Fixed yaml syntax Signed-off-by: Gregory Shimansky <[email protected]> * Removed cpu label from run-on Signed-off-by: Gregory Shimansky <[email protected]> * Added missing zlib-dev Signed-off-by: Gregory Shimansky <[email protected]> * Added missing apt-get update Signed-off-by: Gregory Shimansky <[email protected]> * Remove pip cache because on self-hosted runner it slows things down Signed-off-by: Gregory Shimansky <[email protected]> * Corrected path to tests Signed-off-by: Gregory Shimansky <[email protected]> * Added installation of torch==2.1.2 Signed-off-by: Gregory Shimansky <[email protected]> --------- Signed-off-by: Gregory Shimansky <[email protected]>

* Fixed yaml syntax Signed-off-by: Gregory Shimansky <[email protected]> * Removed cpu label from run-on Signed-off-by: Gregory Shimansky <[email protected]> * Added missing zlib-dev Signed-off-by: Gregory Shimansky <[email protected]> * Added missing apt-get update Signed-off-by: Gregory Shimansky <[email protected]> * Remove pip cache because on self-hosted runner it slows things down Signed-off-by: Gregory Shimansky <[email protected]> * Corrected path to tests Signed-off-by: Gregory Shimansky <[email protected]> * Added installation of torch==2.1.2 Signed-off-by: Gregory Shimansky <[email protected]> --------- Signed-off-by: Gregory Shimansky <[email protected]>

Adds a python wrapper for a parallelized in-place copy function using libxsmm and OpenMP. It is intended to be used for efficient tensor padding implementation. The libxsmm path have to be specified through env variables: - XSMM_ROOT_DIR - path to libxsmm root dir with headers - XSMM_LIB_DIR - path to libxsmm.so location libxsmm .so also has to be available during runtime execution e.g., exposed through LD_LIBRARY_PATH. The XSMM python module can be built and installed using command: pip install -e ./third_party/cpu/python/

* Fixed yaml syntax Signed-off-by: Gregory Shimansky <[email protected]> * Removed cpu label from run-on Signed-off-by: Gregory Shimansky <[email protected]> * Added missing zlib-dev Signed-off-by: Gregory Shimansky <[email protected]> * Added missing apt-get update Signed-off-by: Gregory Shimansky <[email protected]> * Remove pip cache because on self-hosted runner it slows things down Signed-off-by: Gregory Shimansky <[email protected]> * Corrected path to tests Signed-off-by: Gregory Shimansky <[email protected]> * Added installation of torch==2.1.2 Signed-off-by: Gregory Shimansky <[email protected]> --------- Signed-off-by: Gregory Shimansky <[email protected]>

Adds a python wrapper for a parallelized in-place copy function using libxsmm and OpenMP. It is intended to be used for efficient tensor padding implementation. The libxsmm path have to be specified through env variables: - XSMM_ROOT_DIR - path to libxsmm root dir with headers - XSMM_LIB_DIR - path to libxsmm.so location libxsmm .so also has to be available during runtime execution e.g., exposed through LD_LIBRARY_PATH. The XSMM python module can be built and installed using command: pip install -e ./third_party/cpu/python/

gshimansky added 7 commits June 7, 2024 20:52

Fixed yaml syntax

d4bf42f

Signed-off-by: Gregory Shimansky <[email protected]>

Removed cpu label from run-on

d2e8bd4

Signed-off-by: Gregory Shimansky <[email protected]>

Added missing zlib-dev

ed0e702

Signed-off-by: Gregory Shimansky <[email protected]>

Added missing apt-get update

b82ffc8

Signed-off-by: Gregory Shimansky <[email protected]>

Remove pip cache because on self-hosted runner it slows things down

a313763

Signed-off-by: Gregory Shimansky <[email protected]>

Corrected path to tests

ef6f4f6

Signed-off-by: Gregory Shimansky <[email protected]>

Added installation of torch==2.1.2

35f3ba5

Signed-off-by: Gregory Shimansky <[email protected]>

gshimansky requested a review from ptillet as a code owner June 8, 2024 02:04

minjang approved these changes Jun 9, 2024

View reviewed changes

minjang merged commit a6e6362 into main Jun 9, 2024
3 of 6 checks passed

vivekvpandya mentioned this pull request Aug 9, 2024

OptimizeMask pass fails on following input #105

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fixed build and test workflow for intel self-hosted runner #17

Fixed build and test workflow for intel self-hosted runner #17

gshimansky commented Jun 8, 2024

Fixed build and test workflow for intel self-hosted runner #17

Fixed build and test workflow for intel self-hosted runner #17

Conversation

gshimansky commented Jun 8, 2024