-
Notifications
You must be signed in to change notification settings - Fork 16
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[CPU] Use static compilation for kernels. #29
[CPU] Use static compilation for kernels. #29
Conversation
Signed-off-by: Ilya Enkovich <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for simplifying the compilation step! We will follow up on ARM/Mac.
@@ -48,6 +48,8 @@ def _build(name, src, srcdir, library_dirs, include_dirs, libraries): | |||
# CPU backend uses C++ (driver.cpp). Some old version compilers need a specific C++17 flag. | |||
if src.endswith(".cpp") or src.endswith(".cc"): | |||
cc_cmd += ["-std=c++17", "-fopenmp"] | |||
if src.endswith(".s"): | |||
cc_cmd += ["-gdwarf-5"] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we can follow it up later on a different platform and compiler (e.g., Mac/clang), but this makes the problem much simpler: just compiling .s. Looks good.
ret = llvm.translate_to_bc(src) | ||
return ret | ||
def make_asm(src, metadata, options): | ||
return llvm.translate_to_host_asm(src, options.enable_fp_fusion) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Did you measure the compilation time? It tends to be very long in some cases. For sure, we will address it later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It should be the same as we had for JIT. It can be quite long when we produce huge basic blocks (like tens or even hundreds of thousands of operations) because some optimizations can be O(basic_block_size^2). We should generate better code for those cases and use smaller block sizes for the CPU.
Signed-off-by: Ilya Enkovich <[email protected]>
Signed-off-by: Ilya Enkovich <[email protected]>
Signed-off-by: Ilya Enkovich <[email protected]>
Signed-off-by: Ilya Enkovich <[email protected]>
Signed-off-by: Ilya Enkovich <[email protected]>
Signed-off-by: Ilya Enkovich <[email protected]>
Signed-off-by: Ilya Enkovich <[email protected]>
Signed-off-by: Ilya Enkovich <[email protected]>
Signed-off-by: Ilya Enkovich <[email protected]>
This patch moves from JIT compilation for kernels to static compiler usage. The modified compilation flow has several advantages:
cpu_utils.so
This should make #25 and a part of #18 redundant.