-
Notifications
You must be signed in to change notification settings - Fork 16
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[CPU] Use static compilation for kernels. #29
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -42,7 +42,7 @@ def supports_target(target: GPUTarget): | |
|
||
def __init__(self, target: tuple) -> None: | ||
super().__init__(target) | ||
self.binary_ext = "bc" | ||
self.binary_ext = "asm" | ||
|
||
def parse_options(self, opts) -> Any: | ||
args = {k: opts[k] for k in CPUOptions.__dataclass_fields__.keys() if k in opts} | ||
|
@@ -138,22 +138,14 @@ def make_llir(src, metadata, options): | |
return ret | ||
|
||
@staticmethod | ||
def make_bc(src, metadata, options): | ||
if os.environ.get("TRITON_CPU_ASM_DUMP", "0") == "1": | ||
from triton.runtime.cache import get_cache_manager | ||
|
||
asm = llvm.translate_to_host_asm(src, options.enable_fp_fusion) | ||
fn_cache_manager = get_cache_manager(metadata['hash']) | ||
fn_cache_manager.put(asm, f"{metadata['name']}.asm") | ||
|
||
ret = llvm.translate_to_bc(src) | ||
return ret | ||
def make_asm(src, metadata, options): | ||
return llvm.translate_to_host_asm(src, options.enable_fp_fusion) | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Did you measure the compilation time? It tends to be very long in some cases. For sure, we will address it later. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. It should be the same as we had for JIT. It can be quite long when we produce huge basic blocks (like tens or even hundreds of thousands of operations) because some optimizations can be O(basic_block_size^2). We should generate better code for those cases and use smaller block sizes for the CPU. |
||
|
||
def add_stages(self, stages, options): | ||
stages["ttir"] = lambda src, metadata: self.make_ttir(src, metadata, options) | ||
stages["ttcir"] = lambda src, metadata: self.make_ttcir(src, metadata, options) | ||
stages["llir"] = lambda src, metadata: self.make_llir(src, metadata, options) | ||
stages["bc"] = lambda src, metadata: self.make_bc(src, metadata, options) | ||
stages["asm"] = lambda src, metadata: self.make_asm(src, metadata, options) | ||
|
||
@functools.lru_cache() | ||
def hash(self): | ||
|
This file was deleted.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we can follow it up later on a different platform and compiler (e.g., Mac/clang), but this makes the problem much simpler: just compiling .s. Looks good.