Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enzyme fails to differentiate KA.jl kernel in Julia 1.11 #2198

Open
cncastillo opened this issue Dec 14, 2024 · 7 comments
Open

Enzyme fails to differentiate KA.jl kernel in Julia 1.11 #2198

cncastillo opened this issue Dec 14, 2024 · 7 comments

Comments

@cncastillo
Copy link

Hi! First of all, thank you very much for this amazing package 😄 I have been struggling to make this simple example work in Julia 1.11.2 (it works in Julia 1.10.7):

using CUDA, KernelAbstractions, Enzyme

c = 1:64
@kernel function square!(x, @Const(c))
    I = @index(Global, Linear)
    @inbounds x[I] = c[I] * x[I] ^ 2
end

function f!(x, backend)
    kernel = square!(backend)
    kernel(x, c, ndrange = size(x))
    KernelAbstractions.synchronize(backend)
end

x = CUDA.ones(64)
backend = KernelAbstractions.get_backend(x)

∂f_∂x = similar(x)
∂f_∂x .= 1.0
Enzyme.autodiff(
    Reverse, 
    f!, 
    Duplicated(x, ∂f_∂x), 
    Const(backend)
)

∂f_∂x

When running this code I get:

ERROR: Enzyme compilation failed due to an internal error.
 Please open an issue with the code to reproduce and full error log on github.com/EnzymeAD/Enzyme.jl To toggle more information for debugging (needed for bug reports), set Enzyme.Compiler.VERBOSE_ERRORS[] = true (default false)

Stacktrace:
 [1] setindex!
   @ ./array.jl:987
 [2] _sort!
   @ ./sort.jl:831
 [3] multiple call sites
   @ unknown:0

Stacktrace:
  [1] (::Enzyme.Compiler.var"#getparent#69"{})(v::LLVM.Value, offset::LLVM.Value, hasload::Bool)
    @ Enzyme.Compiler ~/.julia/packages/Enzyme/haqjK/src/llvm/transforms.jl:888
  [2] (::Enzyme.Compiler.var"#getparent#69"{})(v::LLVM.Value, offset::LLVM.Value, hasload::Bool)
    @ Enzyme.Compiler ~/.julia/packages/Enzyme/haqjK/src/llvm/transforms.jl:777
  [3] nodecayed_phis!(mod::LLVM.Module)
    @ Enzyme.Compiler ~/.julia/packages/Enzyme/haqjK/src/llvm/transforms.jl:891
  [4] optimize!(mod::LLVM.Module, tm::LLVM.TargetMachine)
    @ Enzyme.Compiler ~/.julia/packages/Enzyme/haqjK/src/compiler/optimize.jl:582
  [5] codegen(output::Symbol, job::GPUCompiler.CompilerJob{…}; libraries::Bool, deferred_codegen::Bool, optimize::Bool, toplevel::Bool, strip::Bool, validate::Bool, only_entry::Bool, parent_job::Nothing)
    @ Enzyme.Compiler ~/.julia/packages/Enzyme/haqjK/src/compiler.jl:4096
  [6] codegen
    @ ~/.julia/packages/Enzyme/haqjK/src/compiler.jl:3223 [inlined]
  [7] _thunk(job::GPUCompiler.CompilerJob{Enzyme.Compiler.EnzymeTarget, Enzyme.Compiler.EnzymeCompilerParams}, postopt::Bool)
    @ Enzyme.Compiler ~/.julia/packages/Enzyme/haqjK/src/compiler.jl:5273
  [8] _thunk
    @ ~/.julia/packages/Enzyme/haqjK/src/compiler.jl:5273 [inlined]
  [9] cached_compilation
    @ ~/.julia/packages/Enzyme/haqjK/src/compiler.jl:5324 [inlined]
 [10] thunkbase(mi::Core.MethodInstance, World::UInt64, FA::Type{…}, A::Type{…}, TT::Type, Mode::Enzyme.API.CDerivativeMode, width::Int64, ModifiedBetween::NTuple{…} where N, ReturnPrimal::Bool, ShadowInit::Bool, ABI::Type, ErrIfFuncWritten::Bool, RuntimeActivity::Bool)
    @ Enzyme.Compiler ~/.julia/packages/Enzyme/haqjK/src/compiler.jl:5434
 [11] thunk_generator(world::UInt64, source::LineNumberNode, FA::Type, A::Type, TT::Type, Mode::Enzyme.API.CDerivativeMode, Width::Int64, ModifiedBetween::NTuple{…} where N, ReturnPrimal::Bool, ShadowInit::Bool, ABI::Type, ErrIfFuncWritten::Bool, RuntimeActivity::Bool, self::Any, fakeworld::Any, fa::Type, a::Type, tt::Type, mode::Type, width::Type, modifiedbetween::Type, returnprimal::Type, shadowinit::Type, abi::Type, erriffuncwritten::Type, runtimeactivity::Type)
    @ Enzyme.Compiler ~/.julia/packages/Enzyme/haqjK/src/compiler.jl:5601
 [12] autodiff
    @ ~/.julia/packages/Enzyme/haqjK/src/Enzyme.jl:485 [inlined]
 [13] autodiff
    @ ~/.julia/packages/Enzyme/haqjK/src/Enzyme.jl:544 [inlined]
 [14] autodiff(::ReverseMode{…}, ::typeof(f!), ::Duplicated{…}, ::Const{…})
    @ Enzyme ~/.julia/packages/Enzyme/haqjK/src/Enzyme.jl:516
 [15] top-level scope
    @ REPL[9]:1
Some type information was truncated. Use `show(err)` to see complete types.
@wsmoses
Copy link
Member

wsmoses commented Dec 17, 2024

edit: ah yeah as you mentioned it does work on 1.10

Can you retry with Julia 1.10? I think this is an issue with 1.11's gc_loaded

@wsmoses
Copy link
Member

wsmoses commented Dec 23, 2024

Can you retry this on latest main and see if it still triggers?

@cncastillo
Copy link
Author

cncastillo commented Dec 23, 2024

I tried with the latest stable version (Enzyme v0.13.25, same error) and the current dev version and I am getting a new error:

StackOverflowError:
Stacktrace:
  [1] LLVM.LLVMType(ref::Ptr{LLVM.API.LLVMOpaqueType})
    @ LLVM ~/.julia/packages/LLVM/wMjUU/src/core/type.jl:49
  [2] value_type
    @ ~/.julia/packages/LLVM/wMjUU/src/core/value.jl:54 [inlined]
  [3] (::Enzyme.Compiler.var"#getparent#71"{LLVM.Context, LLVM.Function, LLVM.IntegerType, Int64, Dict{LLVM.PHIInst, LLVM.PHIInst}, Dict{LLVM.PHIInst, LLVM.PHIInst}, LLVM.PHIInst, LLVM.BitCastInst})(b::LLVM.IRBuilder, v::LLVM.Value, offset::LLVM.Value, hasload::Bool)
    @ Enzyme.Compiler ~/.julia/dev/Enzyme/src/llvm/transforms.jl:609
  [4] (::Enzyme.Compiler.var"#getparent#71"{LLVM.Context, LLVM.Function, LLVM.IntegerType, Int64, Dict{LLVM.PHIInst, LLVM.PHIInst}, Dict{LLVM.PHIInst, LLVM.PHIInst}, LLVM.PHIInst, LLVM.BitCastInst})(b::LLVM.IRBuilder, v::LLVM.Value, offset::LLVM.Value, hasload::Bool)
    @ Enzyme.Compiler ~/.julia/dev/Enzyme/src/llvm/transforms.jl:615
  [5] (::Enzyme.Compiler.var"#getparent#71"{LLVM.Context, LLVM.Function, LLVM.IntegerType, Int64, Dict{LLVM.PHIInst, LLVM.PHIInst}, Dict{LLVM.PHIInst, LLVM.PHIInst}, LLVM.PHIInst, LLVM.BitCastInst})(b::LLVM.IRBuilder, v::LLVM.Value, offset::LLVM.Value, hasload::Bool) (repeats 10889 times)
    @ Enzyme.Compiler ~/.julia/dev/Enzyme/src/llvm/transforms.jl:859
  [6] (::Enzyme.Compiler.var"#getparent#71"{LLVM.Context, LLVM.Function, LLVM.IntegerType, Int64, Dict{LLVM.PHIInst, LLVM.PHIInst}, Dict{LLVM.PHIInst, LLVM.PHIInst}, LLVM.PHIInst, LLVM.BitCastInst})(b::LLVM.IRBuilder, v::LLVM.Value, offset::LLVM.Value, hasload::Bool)
    @ Enzyme.Compiler ~/.julia/dev/Enzyme/src/llvm/transforms.jl:780
  [7] (::Enzyme.Compiler.var"#getparent#71"{LLVM.Context, LLVM.Function, LLVM.IntegerType, Int64, Dict{LLVM.PHIInst, LLVM.PHIInst}, Dict{LLVM.PHIInst, LLVM.PHIInst}, LLVM.PHIInst, LLVM.BitCastInst})(b::LLVM.IRBuilder, v::LLVM.Value, offset::LLVM.Value, hasload::Bool)
    @ Enzyme.Compiler ~/.julia/dev/Enzyme/src/llvm/transforms.jl:644
  [8] (::Enzyme.Compiler.var"#getparent#71"{LLVM.Context, LLVM.Function, LLVM.IntegerType, Int64, Dict{LLVM.PHIInst, LLVM.PHIInst}, Dict{LLVM.PHIInst, LLVM.PHIInst}, LLVM.PHIInst, LLVM.BitCastInst})(b::LLVM.IRBuilder, v::LLVM.Value, offset::LLVM.Value, hasload::Bool)
    @ Enzyme.Compiler ~/.julia/dev/Enzyme/src/llvm/transforms.jl:780
  [9] nodecayed_phis!(mod::LLVM.Module)
    @ Enzyme.Compiler ~/.julia/dev/Enzyme/src/llvm/transforms.jl:933
 [10] optimize!(mod::LLVM.Module, tm::LLVM.TargetMachine)
    @ Enzyme.Compiler ~/.julia/dev/Enzyme/src/compiler/optimize.jl:582
 [11] codegen(output::Symbol, job::GPUCompiler.CompilerJob{Enzyme.Compiler.EnzymeTarget, Enzyme.Compiler.EnzymeCompilerParams}; libraries::Bool, deferred_codegen::Bool, optimize::Bool, toplevel::Bool, strip::Bool, validate::Bool, only_entry::Bool, parent_job::Nothing)
    @ Enzyme.Compiler ~/.julia/dev/Enzyme/src/compiler.jl:4108
 [12] codegen
    @ ~/.julia/dev/Enzyme/src/compiler.jl:3240 [inlined]
 [13] _thunk(job::GPUCompiler.CompilerJob{Enzyme.Compiler.EnzymeTarget, Enzyme.Compiler.EnzymeCompilerParams}, postopt::Bool)
    @ Enzyme.Compiler ~/.julia/dev/Enzyme/src/compiler.jl:5289
 [14] _thunk
    @ ~/.julia/dev/Enzyme/src/compiler.jl:5289 [inlined]
 [15] cached_compilation
    @ ~/.julia/dev/Enzyme/src/compiler.jl:5341 [inlined]
 [16] thunkbase(mi::Core.MethodInstance, World::UInt64, FA::Type{<:Annotation}, A::Type{<:Annotation}, TT::Type, Mode::Enzyme.API.CDerivativeMode, width::Int64, ModifiedBetween::NTuple{N, Bool} where N, ReturnPrimal::Bool, ShadowInit::Bool, ABI::Type, ErrIfFuncWritten::Bool, RuntimeActivity::Bool, edges::Vector{Any})
    @ Enzyme.Compiler ~/.julia/dev/Enzyme/src/compiler.jl:5452
 [17] thunk_generator(world::UInt64, source::LineNumberNode, FA::Type, A::Type, TT::Type, Mode::Enzyme.API.CDerivativeMode, Width::Int64, ModifiedBetween::NTuple{N, Bool} where N, ReturnPrimal::Bool, ShadowInit::Bool, ABI::Type, ErrIfFuncWritten::Bool, RuntimeActivity::Bool, self::Any, fakeworld::Any, fa::Type, a::Type, tt::Type, mode::Type, width::Type, modifiedbetween::Type, returnprimal::Type, shadowinit::Type, abi::Type, erriffuncwritten::Type, runtimeactivity::Type)
    @ Enzyme.Compiler ~/.julia/dev/Enzyme/src/compiler.jl:5637
 [18] autodiff
    @ ~/.julia/dev/Enzyme/src/Enzyme.jl:485 [inlined]
 [19] autodiff
    @ ~/.julia/dev/Enzyme/src/Enzyme.jl:544 [inlined]
 [20] autodiff(::ReverseMode{false, false, FFIABI, false, false}, ::typeof(f!), ::Duplicated{CuArray{Float32, 1, CUDA.DeviceMemory}}, ::Const{CUDABackend})
    @ Enzyme ~/.julia/dev/Enzyme/src/Enzyme.jl:516
 [21] top-level scope
    @ REPL[10]:1

@wsmoses
Copy link
Member

wsmoses commented Jan 2, 2025

okay can you give this a go again? The getparent stuff should be fixed (I hope) now

@cncastillo
Copy link
Author

Using the latest dev version I get:

ERROR: Enzyme compilation failed due to an internal error.
 Please open an issue with the code to reproduce and full error log on github.com/EnzymeAD/Enzyme.jl
 To toggle more information for debugging (needed for bug reports), set Enzyme.Compiler.VERBOSE_ERRORS[] = true (default false)

Stacktrace:
 [1] #synchronize#1003
   @ ~/.julia/packages/CUDA/2kjXI/lib/cudadrv/synchronization.jl:200
 [2] synchronize (repeats 2 times)
   @ ~/.julia/packages/CUDA/2kjXI/lib/cudadrv/synchronization.jl:194
 [3] synchronize
   @ ~/.julia/packages/CUDA/2kjXI/src/CUDAKernels.jl:29
 [4] augmented_primal
   @ ~/.julia/packages/KernelAbstractions/0r40T/ext/EnzymeExt.jl:61

Stacktrace:
  [1] (::Enzyme.Compiler.var"#getparent#69"{})(b::LLVM.IRBuilder, v::LLVM.Value, offset::LLVM.Value, hasload::Bool, phicache::Dict{…})
    @ Enzyme.Compiler ~/.julia/dev/Enzyme/src/llvm/transforms.jl:931
  [2] (::Enzyme.Compiler.var"#getparent#69"{})(b::LLVM.IRBuilder, v::LLVM.Value, offset::LLVM.Value, hasload::Bool, phicache::Dict{…})
    @ Enzyme.Compiler ~/.julia/dev/Enzyme/src/llvm/transforms.jl:615
  [3] (::Enzyme.Compiler.var"#getparent#69"{})(b::LLVM.IRBuilder, v::LLVM.Value, offset::LLVM.Value, hasload::Bool, phicache::Dict{…})
    @ Enzyme.Compiler ~/.julia/dev/Enzyme/src/llvm/transforms.jl:644
  [4] (::Enzyme.Compiler.var"#getparent#69"{})(b::LLVM.IRBuilder, v::LLVM.Value, offset::LLVM.Value, hasload::Bool, phicache::Dict{…})
    @ Enzyme.Compiler ~/.julia/dev/Enzyme/src/llvm/transforms.jl:780
  [5] nodecayed_phis!(mod::LLVM.Module)
    @ Enzyme.Compiler ~/.julia/dev/Enzyme/src/llvm/transforms.jl:938
  [6] optimize!(mod::LLVM.Module, tm::LLVM.TargetMachine)
    @ Enzyme.Compiler ~/.julia/dev/Enzyme/src/compiler/optimize.jl:582
  [7] nested_codegen!(mode::Enzyme.API.CDerivativeMode, mod::LLVM.Module, funcspec::Core.MethodInstance, world::UInt64)
    @ Enzyme.Compiler ~/.julia/dev/Enzyme/src/compiler.jl:401
  [8] enzyme_custom_common_rev(forward::Bool, B::LLVM.IRBuilder, orig::LLVM.CallInst, gutils::Enzyme.Compiler.GradientUtils, normalR::Ptr{…}, shadowR::Ptr{…}, tape::Nothing)
    @ Enzyme.Compiler ~/.julia/dev/Enzyme/src/rules/customrules.jl:960
  [9] enzyme_custom_augfwd
    @ ~/.julia/dev/Enzyme/src/rules/customrules.jl:1503 [inlined]
 [10] enzyme_custom_augfwd_cfunc(B::Ptr{…}, OrigCI::Ptr{…}, gutils::Ptr{…}, normalR::Ptr{…}, shadowR::Ptr{…}, tapeR::Ptr{…})
    @ Enzyme.Compiler ~/.julia/dev/Enzyme/src/rules/llvmrules.jl:18
 [11] EnzymeCreatePrimalAndGradient(logic::Enzyme.Logic, todiff::LLVM.Function, retType::Enzyme.API.CDIFFE_TYPE, constant_args::Vector{…}, TA::Enzyme.TypeAnalysis, returnValue::Bool, dretUsed::Bool, mode::Enzyme.API.CDerivativeMode, runtimeActivity::Bool, width::Int64, additionalArg::Ptr{…}, forceAnonymousTape::Bool, typeInfo::Enzyme.FnTypeInfo, uncacheable_args::Vector{…}, augmented::Ptr{…}, atomicAdd::Bool)
    @ Enzyme.API ~/.julia/dev/Enzyme/src/api.jl:268
 [12] enzyme!(job::GPUCompiler.CompilerJob{…}, mod::LLVM.Module, primalf::LLVM.Function, TT::Type, mode::Enzyme.API.CDerivativeMode, width::Int64, parallel::Bool, actualRetType::Type, wrap::Bool, modifiedBetween::NTuple{…} where N, returnPrimal::Bool, expectedTapeType::Type, loweredArgs::Set{…}, boxedArgs::Set{…})
    @ Enzyme.Compiler ~/.julia/dev/Enzyme/src/compiler.jl:1703
 [13] codegen(output::Symbol, job::GPUCompiler.CompilerJob{…}; libraries::Bool, deferred_codegen::Bool, optimize::Bool, toplevel::Bool, strip::Bool, validate::Bool, only_entry::Bool, parent_job::Nothing)
    @ Enzyme.Compiler ~/.julia/dev/Enzyme/src/compiler.jl:4547
 [14] codegen
    @ ~/.julia/dev/Enzyme/src/compiler.jl:3350 [inlined]
 [15] _thunk(job::GPUCompiler.CompilerJob{Enzyme.Compiler.EnzymeTarget, Enzyme.Compiler.EnzymeCompilerParams}, postopt::Bool)
    @ Enzyme.Compiler ~/.julia/dev/Enzyme/src/compiler.jl:5407
 [16] _thunk
    @ ~/.julia/dev/Enzyme/src/compiler.jl:5407 [inlined]
 [17] cached_compilation
    @ ~/.julia/dev/Enzyme/src/compiler.jl:5459 [inlined]
 [18] thunkbase(mi::Core.MethodInstance, World::UInt64, FA::Type{…}, A::Type{…}, TT::Type, Mode::Enzyme.API.CDerivativeMode, width::Int64, ModifiedBetween::NTuple{…} where N, ReturnPrimal::Bool, ShadowInit::Bool, ABI::Type, ErrIfFuncWritten::Bool, RuntimeActivity::Bool, edges::Vector{…})
    @ Enzyme.Compiler ~/.julia/dev/Enzyme/src/compiler.jl:5570
 [19] thunk_generator(world::UInt64, source::LineNumberNode, FA::Type, A::Type, TT::Type, Mode::Enzyme.API.CDerivativeMode, Width::Int64, ModifiedBetween::NTuple{…} where N, ReturnPrimal::Bool, ShadowInit::Bool, ABI::Type, ErrIfFuncWritten::Bool, RuntimeActivity::Bool, self::Any, fakeworld::Any, fa::Type, a::Type, tt::Type, mode::Type, width::Type, modifiedbetween::Type, returnprimal::Type, shadowinit::Type, abi::Type, erriffuncwritten::Type, runtimeactivity::Type)
    @ Enzyme.Compiler ~/.julia/dev/Enzyme/src/compiler.jl:5755
 [20] autodiff
    @ ~/.julia/dev/Enzyme/src/Enzyme.jl:485 [inlined]
 [21] autodiff
    @ ~/.julia/dev/Enzyme/src/Enzyme.jl:544 [inlined]
 [22] autodiff(::ReverseMode{…}, ::typeof(f!), ::Duplicated{…}, ::Const{…})
    @ Enzyme ~/.julia/dev/Enzyme/src/Enzyme.jl:516
 [23] top-level scope
    @ REPL[16]:1
Some type information was truncated. Use `show(err)` to see complete types.

@wsmoses
Copy link
Member

wsmoses commented Jan 9, 2025

okay my patch to CUDA.jl fixing that has been released, want to give it another go?

@cncastillo
Copy link
Author

Funnily enough, now it is failing in a totally unrelated line

∂f_∂x .= 1.0

It says Not implemented which is kind of weird.

Anyway, if I use ∂f_∂x = CUDA.ones(64) it takes like 3 minutes to compile and then I get the same error 🤕.

Enzyme compilation failed due to an internal error.
 Please open an issue with the code to reproduce and full error log on github.com/EnzymeAD/Enzyme.jl
 To toggle more information for debugging (needed for bug reports), set Enzyme.Compiler.VERBOSE_ERRORS[] = true (default false)

Stacktrace:
 [1] #synchronize#1003
   @ ~/.julia/packages/CUDA/1kIOw/lib/cudadrv/synchronization.jl:200
 [2] synchronize (repeats 2 times)
   @ ~/.julia/packages/CUDA/1kIOw/lib/cudadrv/synchronization.jl:194
 [3] synchronize
   @ ~/.julia/packages/CUDA/1kIOw/src/CUDAKernels.jl:29
 [4] augmented_primal
   @ ~/.julia/packages/KernelAbstractions/0r40T/ext/EnzymeExt.jl:61

I am using:

  • KernelAbstractions.jl v0.9.31
  • Enzyme.jl v0.13.28
  • CUDA.jl v5.6.1 (should include PR 2605)

Linking this PR: JuliaGPU/CUDA.jl#2605

Complete error:

Enzyme compilation failed due to an internal error.

 Please open an issue with the code to reproduce and full error log on github.com/EnzymeAD/Enzyme.jl

 To toggle more information for debugging (needed for bug reports), set Enzyme.Compiler.VERBOSE_ERRORS[] = true (default false)

Current scope: 

define internal fastcc void @julia_nonblocking_synchronize_86971({} addrspace(10)* noundef nonnull align 8 dereferenceable(40) %0) unnamed_addr #142 !dbg !4415 {

top:

  %1 = alloca [3 x [2 x {} addrspace(10)*]], align 8

  %pgcstack = call {}*** @julia.get_pgcstack()

  %ptls_field6 = getelementptr inbounds {}**, {}*** %pgcstack, i64 2

  %2 = bitcast {}*** %ptls_field6 to i64***

  %ptls_load78 = load i64**, i64*** %2, align 8, !tbaa !263

  %3 = getelementptr inbounds i64*, i64** %ptls_load78, i64 2

  %safepoint = load i64*, i64** %3, align 8, !tbaa !267

  fence syncscope("singlethread") seq_cst

  call void @julia.safepoint(i64* %safepoint), !dbg !4416

  fence syncscope("singlethread") seq_cst

  %4 = call nonnull {}* @julia.pointer_from_objref({} addrspace(11)* noundef addrspacecast ({}* inttoptr (i64 138703909114864 to {}*) to {} addrspace(11)*)) #311, !dbg !4417

  %ptr.i = bitcast {}* %4 to i32*, !dbg !4421

  %rv.i = atomicrmw add i32* %ptr.i, i32 1 acq_rel, align 4, !dbg !4421

  %5 = and i32 %rv.i, 3, !dbg !4424

  %.not = icmp eq i32 %5, 0, !dbg !4432

  %narrow = select i1 %.not, i32 4, i32 %5, !dbg !4434

  %6 = zext i32 %narrow to i64, !dbg !4434

  %7 = load i64, i64* inttoptr (i64 138703909115008 to i64*), align 128, !dbg !4436, !tbaa !455, !alias.scope !327, !noalias !328

  %8 = add nsw i64 %6, -1, !dbg !4449

  %.not9 = icmp ult i64 %8, %7, !dbg !4452

  br i1 %.not9, label %L40, label %L49, !dbg !4446


L40:                                              ; preds = %top

  %9 = load {} addrspace(10)**, {} addrspace(10)*** inttoptr (i64 138703909114992 to {} addrspace(10)***), align 16, !dbg !4454, !tbaa !324, !alias.scope !327, !noalias !328

  %10 = load {} addrspace(10)*, {} addrspace(10)** inttoptr (i64 138703909115000 to {} addrspace(10)**), align 8, !dbg !4454, !tbaa !324, !alias.scope !327, !noalias !328, !dereferenceable_or_null !329, !align !330

  %11 = call {} addrspace(10)* addrspace(13)* @julia.gc_loaded({} addrspace(10)* noundef %10, {} addrspace(10)** noundef %9), !dbg !4457

  %12 = bitcast {} addrspace(10)* addrspace(13)* %11 to [3 x [2 x {} addrspace(10)*]] addrspace(13)*, !dbg !4457

  %13 = getelementptr inbounds [3 x [2 x {} addrspace(10)*]], [3 x [2 x {} addrspace(10)*]] addrspace(13)* %12, i64 %8, i64 0, i64 0, !dbg !4457

  %14 = load {} addrspace(10)*, {} addrspace(10)* addrspace(13)* %13, align 8, !dbg !4457, !tbaa !331, !alias.scope !334, !noalias !335

  %.not24 = icmp eq {} addrspace(10)* %14, null, !dbg !4457

  br i1 %.not24, label %L49, label %pass3, !dbg !4448


L49:                                              ; preds = %L40, %top

  call fastcc void @julia_create_synchronization_worker_88807(i64 signext %6), !dbg !4459

  %.pre = load {} addrspace(10)**, {} addrspace(10)*** inttoptr (i64 138703909114992 to {} addrspace(10)***), align 16, !dbg !4460, !tbaa !324, !alias.scope !327, !noalias !328

  %.pre25 = load {} addrspace(10)*, {} addrspace(10)** inttoptr (i64 138703909115000 to {} addrspace(10)**), align 8, !dbg !4460, !tbaa !324, !alias.scope !327, !noalias !328

  %.pre26 = call {} addrspace(10)* addrspace(13)* @julia.gc_loaded({} addrspace(10)* noundef %.pre25, {} addrspace(10)** noundef %.pre), !dbg !4460

  %.pre27 = bitcast {} addrspace(10)* addrspace(13)* %.pre26 to [3 x [2 x {} addrspace(10)*]] addrspace(13)*, !dbg !4460

  %.unpack.elt.phi.trans.insert = getelementptr inbounds [3 x [2 x {} addrspace(10)*]], [3 x [2 x {} addrspace(10)*]] addrspace(13)* %.pre27, i64 %8, i64 0, i64 0

  %.unpack.unpack.pre = load {} addrspace(10)*, {} addrspace(10)* addrspace(13)* %.unpack.elt.phi.trans.insert, align 8, !dbg !4460, !tbaa !331, !alias.scope !334, !noalias !335

  %.not23 = icmp eq {} addrspace(10)* %.unpack.unpack.pre, null, !dbg !4460

  br i1 %.not23, label %fail2, label %pass3, !dbg !4460


L75:                                              ; preds = %pass3

  call fastcc void @julia_throw_api_error_86996(i32 zeroext %18) #312, !dbg !4463

  unreachable, !dbg !4463


L77:                                              ; preds = %pass3

  ret void, !dbg !4464


fail2:                                            ; preds = %L49

  %15 = load {}*, {}** @jl_undefref_exception, align 8, !dbg !4460, !tbaa !267, !alias.scope !309, !noalias !312, !nonnull !262

  %16 = addrspacecast {}* %15 to {} addrspace(12)*, !dbg !4460

  call void @ijl_throw({} addrspace(12)* %16) #312, !dbg !4460

  unreachable, !dbg !4460


pass3:                                            ; preds = %L40, %L49

  %nodecayed..pre-phi2834 = phi {} addrspace(10)* 

  %nodecayedoff..pre-phi2834 = phi i64 

  %.pre-phi2834 = phi [3 x [2 x {} addrspace(10)*]] addrspace(13)* [ %.pre27, %L49 ], [ %12, %L40 ]

  %.unpack.unpack33 = phi {} addrspace(10)* [ %.unpack.unpack.pre, %L49 ], [ %14, %L40 ]

  %.unpack.elt14 = getelementptr inbounds [3 x [2 x {} addrspace(10)*]], [3 x [2 x {} addrspace(10)*]] addrspace(13)* %.pre-phi2834, i64 %8, i64 0, i64 1, !dbg !4460

  %.unpack.unpack15 = load {} addrspace(10)*, {} addrspace(10)* addrspace(13)* %.unpack.elt14, align 8, !dbg !4460, !tbaa !331, !alias.scope !334, !noalias !335

  %.unpack11.elt = getelementptr inbounds [3 x [2 x {} addrspace(10)*]], [3 x [2 x {} addrspace(10)*]] addrspace(13)* %.pre-phi2834, i64 %8, i64 1, i64 0, !dbg !4460

  %.unpack11.unpack = load {} addrspace(10)*, {} addrspace(10)* addrspace(13)* %.unpack11.elt, align 8, !dbg !4460, !tbaa !331, !alias.scope !334, !noalias !335

  %.unpack11.elt17 = getelementptr inbounds [3 x [2 x {} addrspace(10)*]], [3 x [2 x {} addrspace(10)*]] addrspace(13)* %.pre-phi2834, i64 %8, i64 1, i64 1, !dbg !4460

  %.unpack11.unpack18 = load {} addrspace(10)*, {} addrspace(10)* addrspace(13)* %.unpack11.elt17, align 8, !dbg !4460, !tbaa !331, !alias.scope !334, !noalias !335

  %.unpack13.elt = getelementptr inbounds [3 x [2 x {} addrspace(10)*]], [3 x [2 x {} addrspace(10)*]] addrspace(13)* %.pre-phi2834, i64 %8, i64 2, i64 0, !dbg !4460

  %.unpack13.unpack = load {} addrspace(10)*, {} addrspace(10)* addrspace(13)* %.unpack13.elt, align 8, !dbg !4460, !tbaa !331, !alias.scope !334, !noalias !335

  %.unpack13.elt20 = getelementptr inbounds [3 x [2 x {} addrspace(10)*]], [3 x [2 x {} addrspace(10)*]] addrspace(13)* %.pre-phi2834, i64 %8, i64 2, i64 1, !dbg !4460

  %.unpack13.unpack21 = load {} addrspace(10)*, {} addrspace(10)* addrspace(13)* %.unpack13.elt20, align 8, !dbg !4460, !tbaa !331, !alias.scope !334, !noalias !335

  %.fca.0.0.gep = getelementptr inbounds [3 x [2 x {} addrspace(10)*]], [3 x [2 x {} addrspace(10)*]]* %1, i64 0, i64 0, i64 0, !dbg !4465

  store {} addrspace(10)* %.unpack.unpack33, {} addrspace(10)** %.fca.0.0.gep, align 8, !dbg !4465, !noalias !340

  %.fca.0.1.gep = getelementptr inbounds [3 x [2 x {} addrspace(10)*]], [3 x [2 x {} addrspace(10)*]]* %1, i64 0, i64 0, i64 1, !dbg !4465

  store {} addrspace(10)* %.unpack.unpack15, {} addrspace(10)** %.fca.0.1.gep, align 8, !dbg !4465, !noalias !340

  %.fca.1.0.gep = getelementptr inbounds [3 x [2 x {} addrspace(10)*]], [3 x [2 x {} addrspace(10)*]]* %1, i64 0, i64 1, i64 0, !dbg !4465

  store {} addrspace(10)* %.unpack11.unpack, {} addrspace(10)** %.fca.1.0.gep, align 8, !dbg !4465, !noalias !340

  %.fca.1.1.gep = getelementptr inbounds [3 x [2 x {} addrspace(10)*]], [3 x [2 x {} addrspace(10)*]]* %1, i64 0, i64 1, i64 1, !dbg !4465

  store {} addrspace(10)* %.unpack11.unpack18, {} addrspace(10)** %.fca.1.1.gep, align 8, !dbg !4465, !noalias !340

  %.fca.2.0.gep = getelementptr inbounds [3 x [2 x {} addrspace(10)*]], [3 x [2 x {} addrspace(10)*]]* %1, i64 0, i64 2, i64 0, !dbg !4465

  store {} addrspace(10)* %.unpack13.unpack, {} addrspace(10)** %.fca.2.0.gep, align 8, !dbg !4465, !noalias !340

  %.fca.2.1.gep = getelementptr inbounds [3 x [2 x {} addrspace(10)*]], [3 x [2 x {} addrspace(10)*]]* %1, i64 0, i64 2, i64 1, !dbg !4465

  store {} addrspace(10)* %.unpack13.unpack21, {} addrspace(10)** %.fca.2.1.gep, align 8, !dbg !4465, !noalias !340

  %17 = addrspacecast [3 x [2 x {} addrspace(10)*]]* %1 to [3 x [2 x {} addrspace(10)*]] addrspace(11)*, !dbg !4465

  %18 = call fastcc i32 @julia_put__88773([3 x [2 x {} addrspace(10)*]] addrspace(11)* nocapture noundef nonnull readonly align 8 dereferenceable(48) %17, {} addrspace(10)* noundef nonnull align 8 dereferenceable(40) %0), !dbg !4465

  %19 = icmp eq i32 %18, 0, !dbg !4466

  br i1 %19, label %L77, label %L75, !dbg !4471

}


Could not analyze garbage collection behavior of

 inst:   %.pre-phi2834 = phi [3 x [2 x {} addrspace(10)*]] addrspace(13)* [ %.pre27, %L49 ], [ %12, %L40 ]

 v0:   %.pre27 = bitcast {} addrspace(10)* addrspace(13)* %.pre26 to [3 x [2 x {} addrspace(10)*]] addrspace(13)*, !dbg !343

 v: {} addrspace(10)*** inttoptr (i64 138703909114992 to {} addrspace(10)***)

 offset: i64 0

 hasload: true



Stacktrace:

 [1] #synchronize#1003

   @ ~/.julia/packages/CUDA/1kIOw/lib/cudadrv/synchronization.jl:200

 [2] synchronize (repeats 2 times)

   @ ~/.julia/packages/CUDA/1kIOw/lib/cudadrv/synchronization.jl:194

 [3] synchronize

   @ ~/.julia/packages/CUDA/1kIOw/src/CUDAKernels.jl:29

 [4] augmented_primal

   @ ~/.julia/packages/KernelAbstractions/0r40T/ext/EnzymeExt.jl:61

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants