Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[CPU] Use static compilation for kernels. #29

Merged

Conversation

ienkovich
Copy link
Collaborator

This patch moves from JIT compilation for kernels to static compiler usage. The modified compilation flow has several advantages:

  • Compiled kernels are cached now
  • ASM file is always produced as a part of compilation (previously, it was compiled separately and could mismatch JITed code)
  • New code is simpler (no cpu_utils.so)
  • No more problems with additional LLVM lib instances in cpu_utils.so

This should make #25 and a part of #18 redundant.

@ienkovich ienkovich requested a review from minjang June 20, 2024 15:13
@ienkovich ienkovich requested a review from ptillet as a code owner June 20, 2024 15:13
Copy link
Collaborator

@minjang minjang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for simplifying the compilation step! We will follow up on ARM/Mac.

@@ -48,6 +48,8 @@ def _build(name, src, srcdir, library_dirs, include_dirs, libraries):
# CPU backend uses C++ (driver.cpp). Some old version compilers need a specific C++17 flag.
if src.endswith(".cpp") or src.endswith(".cc"):
cc_cmd += ["-std=c++17", "-fopenmp"]
if src.endswith(".s"):
cc_cmd += ["-gdwarf-5"]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can follow it up later on a different platform and compiler (e.g., Mac/clang), but this makes the problem much simpler: just compiling .s. Looks good.

ret = llvm.translate_to_bc(src)
return ret
def make_asm(src, metadata, options):
return llvm.translate_to_host_asm(src, options.enable_fp_fusion)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did you measure the compilation time? It tends to be very long in some cases. For sure, we will address it later.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It should be the same as we had for JIT. It can be quite long when we produce huge basic blocks (like tens or even hundreds of thousands of operations) because some optimizations can be O(basic_block_size^2). We should generate better code for those cases and use smaller block sizes for the CPU.

@minjang minjang merged commit 1cf81ef into triton-lang:main Jun 20, 2024
2 of 4 checks passed
@ienkovich ienkovich deleted the ienkovich/cpu/static-compilation branch June 20, 2024 21:18
minjang pushed a commit to minjang/triton-cpu that referenced this pull request Jun 22, 2024
minjang pushed a commit that referenced this pull request Jun 24, 2024
Devjiu pushed a commit to Devjiu/triton-cpu that referenced this pull request Aug 13, 2024
int3 pushed a commit that referenced this pull request Aug 29, 2024
minjang pushed a commit that referenced this pull request Sep 22, 2024
minjang pushed a commit that referenced this pull request Oct 22, 2024
minjang pushed a commit that referenced this pull request Oct 24, 2024
int3 pushed a commit that referenced this pull request Dec 6, 2024
ienkovich added a commit that referenced this pull request Dec 6, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants