-
Notifications
You must be signed in to change notification settings - Fork 69
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Enzyme fails to differentiate KA.jl kernel in Julia 1.11 #2198
Comments
edit: ah yeah as you mentioned it does work on 1.10 Can you retry with Julia 1.10? I think this is an issue with 1.11's gc_loaded |
Can you retry this on latest main and see if it still triggers? |
I tried with the latest stable version (Enzyme v0.13.25, same error) and the current dev version and I am getting a new error: StackOverflowError:
Stacktrace:
[1] LLVM.LLVMType(ref::Ptr{LLVM.API.LLVMOpaqueType})
@ LLVM ~/.julia/packages/LLVM/wMjUU/src/core/type.jl:49
[2] value_type
@ ~/.julia/packages/LLVM/wMjUU/src/core/value.jl:54 [inlined]
[3] (::Enzyme.Compiler.var"#getparent#71"{LLVM.Context, LLVM.Function, LLVM.IntegerType, Int64, Dict{LLVM.PHIInst, LLVM.PHIInst}, Dict{LLVM.PHIInst, LLVM.PHIInst}, LLVM.PHIInst, LLVM.BitCastInst})(b::LLVM.IRBuilder, v::LLVM.Value, offset::LLVM.Value, hasload::Bool)
@ Enzyme.Compiler ~/.julia/dev/Enzyme/src/llvm/transforms.jl:609
[4] (::Enzyme.Compiler.var"#getparent#71"{LLVM.Context, LLVM.Function, LLVM.IntegerType, Int64, Dict{LLVM.PHIInst, LLVM.PHIInst}, Dict{LLVM.PHIInst, LLVM.PHIInst}, LLVM.PHIInst, LLVM.BitCastInst})(b::LLVM.IRBuilder, v::LLVM.Value, offset::LLVM.Value, hasload::Bool)
@ Enzyme.Compiler ~/.julia/dev/Enzyme/src/llvm/transforms.jl:615
[5] (::Enzyme.Compiler.var"#getparent#71"{LLVM.Context, LLVM.Function, LLVM.IntegerType, Int64, Dict{LLVM.PHIInst, LLVM.PHIInst}, Dict{LLVM.PHIInst, LLVM.PHIInst}, LLVM.PHIInst, LLVM.BitCastInst})(b::LLVM.IRBuilder, v::LLVM.Value, offset::LLVM.Value, hasload::Bool) (repeats 10889 times)
@ Enzyme.Compiler ~/.julia/dev/Enzyme/src/llvm/transforms.jl:859
[6] (::Enzyme.Compiler.var"#getparent#71"{LLVM.Context, LLVM.Function, LLVM.IntegerType, Int64, Dict{LLVM.PHIInst, LLVM.PHIInst}, Dict{LLVM.PHIInst, LLVM.PHIInst}, LLVM.PHIInst, LLVM.BitCastInst})(b::LLVM.IRBuilder, v::LLVM.Value, offset::LLVM.Value, hasload::Bool)
@ Enzyme.Compiler ~/.julia/dev/Enzyme/src/llvm/transforms.jl:780
[7] (::Enzyme.Compiler.var"#getparent#71"{LLVM.Context, LLVM.Function, LLVM.IntegerType, Int64, Dict{LLVM.PHIInst, LLVM.PHIInst}, Dict{LLVM.PHIInst, LLVM.PHIInst}, LLVM.PHIInst, LLVM.BitCastInst})(b::LLVM.IRBuilder, v::LLVM.Value, offset::LLVM.Value, hasload::Bool)
@ Enzyme.Compiler ~/.julia/dev/Enzyme/src/llvm/transforms.jl:644
[8] (::Enzyme.Compiler.var"#getparent#71"{LLVM.Context, LLVM.Function, LLVM.IntegerType, Int64, Dict{LLVM.PHIInst, LLVM.PHIInst}, Dict{LLVM.PHIInst, LLVM.PHIInst}, LLVM.PHIInst, LLVM.BitCastInst})(b::LLVM.IRBuilder, v::LLVM.Value, offset::LLVM.Value, hasload::Bool)
@ Enzyme.Compiler ~/.julia/dev/Enzyme/src/llvm/transforms.jl:780
[9] nodecayed_phis!(mod::LLVM.Module)
@ Enzyme.Compiler ~/.julia/dev/Enzyme/src/llvm/transforms.jl:933
[10] optimize!(mod::LLVM.Module, tm::LLVM.TargetMachine)
@ Enzyme.Compiler ~/.julia/dev/Enzyme/src/compiler/optimize.jl:582
[11] codegen(output::Symbol, job::GPUCompiler.CompilerJob{Enzyme.Compiler.EnzymeTarget, Enzyme.Compiler.EnzymeCompilerParams}; libraries::Bool, deferred_codegen::Bool, optimize::Bool, toplevel::Bool, strip::Bool, validate::Bool, only_entry::Bool, parent_job::Nothing)
@ Enzyme.Compiler ~/.julia/dev/Enzyme/src/compiler.jl:4108
[12] codegen
@ ~/.julia/dev/Enzyme/src/compiler.jl:3240 [inlined]
[13] _thunk(job::GPUCompiler.CompilerJob{Enzyme.Compiler.EnzymeTarget, Enzyme.Compiler.EnzymeCompilerParams}, postopt::Bool)
@ Enzyme.Compiler ~/.julia/dev/Enzyme/src/compiler.jl:5289
[14] _thunk
@ ~/.julia/dev/Enzyme/src/compiler.jl:5289 [inlined]
[15] cached_compilation
@ ~/.julia/dev/Enzyme/src/compiler.jl:5341 [inlined]
[16] thunkbase(mi::Core.MethodInstance, World::UInt64, FA::Type{<:Annotation}, A::Type{<:Annotation}, TT::Type, Mode::Enzyme.API.CDerivativeMode, width::Int64, ModifiedBetween::NTuple{N, Bool} where N, ReturnPrimal::Bool, ShadowInit::Bool, ABI::Type, ErrIfFuncWritten::Bool, RuntimeActivity::Bool, edges::Vector{Any})
@ Enzyme.Compiler ~/.julia/dev/Enzyme/src/compiler.jl:5452
[17] thunk_generator(world::UInt64, source::LineNumberNode, FA::Type, A::Type, TT::Type, Mode::Enzyme.API.CDerivativeMode, Width::Int64, ModifiedBetween::NTuple{N, Bool} where N, ReturnPrimal::Bool, ShadowInit::Bool, ABI::Type, ErrIfFuncWritten::Bool, RuntimeActivity::Bool, self::Any, fakeworld::Any, fa::Type, a::Type, tt::Type, mode::Type, width::Type, modifiedbetween::Type, returnprimal::Type, shadowinit::Type, abi::Type, erriffuncwritten::Type, runtimeactivity::Type)
@ Enzyme.Compiler ~/.julia/dev/Enzyme/src/compiler.jl:5637
[18] autodiff
@ ~/.julia/dev/Enzyme/src/Enzyme.jl:485 [inlined]
[19] autodiff
@ ~/.julia/dev/Enzyme/src/Enzyme.jl:544 [inlined]
[20] autodiff(::ReverseMode{false, false, FFIABI, false, false}, ::typeof(f!), ::Duplicated{CuArray{Float32, 1, CUDA.DeviceMemory}}, ::Const{CUDABackend})
@ Enzyme ~/.julia/dev/Enzyme/src/Enzyme.jl:516
[21] top-level scope
@ REPL[10]:1 |
okay can you give this a go again? The getparent stuff should be fixed (I hope) now |
Using the latest dev version I get: ERROR: Enzyme compilation failed due to an internal error.
Please open an issue with the code to reproduce and full error log on github.com/EnzymeAD/Enzyme.jl
To toggle more information for debugging (needed for bug reports), set Enzyme.Compiler.VERBOSE_ERRORS[] = true (default false)
Stacktrace:
[1] #synchronize#1003
@ ~/.julia/packages/CUDA/2kjXI/lib/cudadrv/synchronization.jl:200
[2] synchronize (repeats 2 times)
@ ~/.julia/packages/CUDA/2kjXI/lib/cudadrv/synchronization.jl:194
[3] synchronize
@ ~/.julia/packages/CUDA/2kjXI/src/CUDAKernels.jl:29
[4] augmented_primal
@ ~/.julia/packages/KernelAbstractions/0r40T/ext/EnzymeExt.jl:61
Stacktrace:
[1] (::Enzyme.Compiler.var"#getparent#69"{…})(b::LLVM.IRBuilder, v::LLVM.Value, offset::LLVM.Value, hasload::Bool, phicache::Dict{…})
@ Enzyme.Compiler ~/.julia/dev/Enzyme/src/llvm/transforms.jl:931
[2] (::Enzyme.Compiler.var"#getparent#69"{…})(b::LLVM.IRBuilder, v::LLVM.Value, offset::LLVM.Value, hasload::Bool, phicache::Dict{…})
@ Enzyme.Compiler ~/.julia/dev/Enzyme/src/llvm/transforms.jl:615
[3] (::Enzyme.Compiler.var"#getparent#69"{…})(b::LLVM.IRBuilder, v::LLVM.Value, offset::LLVM.Value, hasload::Bool, phicache::Dict{…})
@ Enzyme.Compiler ~/.julia/dev/Enzyme/src/llvm/transforms.jl:644
[4] (::Enzyme.Compiler.var"#getparent#69"{…})(b::LLVM.IRBuilder, v::LLVM.Value, offset::LLVM.Value, hasload::Bool, phicache::Dict{…})
@ Enzyme.Compiler ~/.julia/dev/Enzyme/src/llvm/transforms.jl:780
[5] nodecayed_phis!(mod::LLVM.Module)
@ Enzyme.Compiler ~/.julia/dev/Enzyme/src/llvm/transforms.jl:938
[6] optimize!(mod::LLVM.Module, tm::LLVM.TargetMachine)
@ Enzyme.Compiler ~/.julia/dev/Enzyme/src/compiler/optimize.jl:582
[7] nested_codegen!(mode::Enzyme.API.CDerivativeMode, mod::LLVM.Module, funcspec::Core.MethodInstance, world::UInt64)
@ Enzyme.Compiler ~/.julia/dev/Enzyme/src/compiler.jl:401
[8] enzyme_custom_common_rev(forward::Bool, B::LLVM.IRBuilder, orig::LLVM.CallInst, gutils::Enzyme.Compiler.GradientUtils, normalR::Ptr{…}, shadowR::Ptr{…}, tape::Nothing)
@ Enzyme.Compiler ~/.julia/dev/Enzyme/src/rules/customrules.jl:960
[9] enzyme_custom_augfwd
@ ~/.julia/dev/Enzyme/src/rules/customrules.jl:1503 [inlined]
[10] enzyme_custom_augfwd_cfunc(B::Ptr{…}, OrigCI::Ptr{…}, gutils::Ptr{…}, normalR::Ptr{…}, shadowR::Ptr{…}, tapeR::Ptr{…})
@ Enzyme.Compiler ~/.julia/dev/Enzyme/src/rules/llvmrules.jl:18
[11] EnzymeCreatePrimalAndGradient(logic::Enzyme.Logic, todiff::LLVM.Function, retType::Enzyme.API.CDIFFE_TYPE, constant_args::Vector{…}, TA::Enzyme.TypeAnalysis, returnValue::Bool, dretUsed::Bool, mode::Enzyme.API.CDerivativeMode, runtimeActivity::Bool, width::Int64, additionalArg::Ptr{…}, forceAnonymousTape::Bool, typeInfo::Enzyme.FnTypeInfo, uncacheable_args::Vector{…}, augmented::Ptr{…}, atomicAdd::Bool)
@ Enzyme.API ~/.julia/dev/Enzyme/src/api.jl:268
[12] enzyme!(job::GPUCompiler.CompilerJob{…}, mod::LLVM.Module, primalf::LLVM.Function, TT::Type, mode::Enzyme.API.CDerivativeMode, width::Int64, parallel::Bool, actualRetType::Type, wrap::Bool, modifiedBetween::NTuple{…} where N, returnPrimal::Bool, expectedTapeType::Type, loweredArgs::Set{…}, boxedArgs::Set{…})
@ Enzyme.Compiler ~/.julia/dev/Enzyme/src/compiler.jl:1703
[13] codegen(output::Symbol, job::GPUCompiler.CompilerJob{…}; libraries::Bool, deferred_codegen::Bool, optimize::Bool, toplevel::Bool, strip::Bool, validate::Bool, only_entry::Bool, parent_job::Nothing)
@ Enzyme.Compiler ~/.julia/dev/Enzyme/src/compiler.jl:4547
[14] codegen
@ ~/.julia/dev/Enzyme/src/compiler.jl:3350 [inlined]
[15] _thunk(job::GPUCompiler.CompilerJob{Enzyme.Compiler.EnzymeTarget, Enzyme.Compiler.EnzymeCompilerParams}, postopt::Bool)
@ Enzyme.Compiler ~/.julia/dev/Enzyme/src/compiler.jl:5407
[16] _thunk
@ ~/.julia/dev/Enzyme/src/compiler.jl:5407 [inlined]
[17] cached_compilation
@ ~/.julia/dev/Enzyme/src/compiler.jl:5459 [inlined]
[18] thunkbase(mi::Core.MethodInstance, World::UInt64, FA::Type{…}, A::Type{…}, TT::Type, Mode::Enzyme.API.CDerivativeMode, width::Int64, ModifiedBetween::NTuple{…} where N, ReturnPrimal::Bool, ShadowInit::Bool, ABI::Type, ErrIfFuncWritten::Bool, RuntimeActivity::Bool, edges::Vector{…})
@ Enzyme.Compiler ~/.julia/dev/Enzyme/src/compiler.jl:5570
[19] thunk_generator(world::UInt64, source::LineNumberNode, FA::Type, A::Type, TT::Type, Mode::Enzyme.API.CDerivativeMode, Width::Int64, ModifiedBetween::NTuple{…} where N, ReturnPrimal::Bool, ShadowInit::Bool, ABI::Type, ErrIfFuncWritten::Bool, RuntimeActivity::Bool, self::Any, fakeworld::Any, fa::Type, a::Type, tt::Type, mode::Type, width::Type, modifiedbetween::Type, returnprimal::Type, shadowinit::Type, abi::Type, erriffuncwritten::Type, runtimeactivity::Type)
@ Enzyme.Compiler ~/.julia/dev/Enzyme/src/compiler.jl:5755
[20] autodiff
@ ~/.julia/dev/Enzyme/src/Enzyme.jl:485 [inlined]
[21] autodiff
@ ~/.julia/dev/Enzyme/src/Enzyme.jl:544 [inlined]
[22] autodiff(::ReverseMode{…}, ::typeof(f!), ::Duplicated{…}, ::Const{…})
@ Enzyme ~/.julia/dev/Enzyme/src/Enzyme.jl:516
[23] top-level scope
@ REPL[16]:1
Some type information was truncated. Use `show(err)` to see complete types. |
okay my patch to CUDA.jl fixing that has been released, want to give it another go? |
Funnily enough, now it is failing in a totally unrelated line ∂f_∂x .= 1.0 It says Anyway, if I use Enzyme compilation failed due to an internal error.
Please open an issue with the code to reproduce and full error log on github.com/EnzymeAD/Enzyme.jl
To toggle more information for debugging (needed for bug reports), set Enzyme.Compiler.VERBOSE_ERRORS[] = true (default false)
Stacktrace:
[1] #synchronize#1003
@ ~/.julia/packages/CUDA/1kIOw/lib/cudadrv/synchronization.jl:200
[2] synchronize (repeats 2 times)
@ ~/.julia/packages/CUDA/1kIOw/lib/cudadrv/synchronization.jl:194
[3] synchronize
@ ~/.julia/packages/CUDA/1kIOw/src/CUDAKernels.jl:29
[4] augmented_primal
@ ~/.julia/packages/KernelAbstractions/0r40T/ext/EnzymeExt.jl:61 I am using:
Linking this PR: JuliaGPU/CUDA.jl#2605 Complete error: Enzyme compilation failed due to an internal error.
Please open an issue with the code to reproduce and full error log on github.com/EnzymeAD/Enzyme.jl
To toggle more information for debugging (needed for bug reports), set Enzyme.Compiler.VERBOSE_ERRORS[] = true (default false)
Current scope:
define internal fastcc void @julia_nonblocking_synchronize_86971({} addrspace(10)* noundef nonnull align 8 dereferenceable(40) %0) unnamed_addr #142 !dbg !4415 {
top:
%1 = alloca [3 x [2 x {} addrspace(10)*]], align 8
%pgcstack = call {}*** @julia.get_pgcstack()
%ptls_field6 = getelementptr inbounds {}**, {}*** %pgcstack, i64 2
%2 = bitcast {}*** %ptls_field6 to i64***
%ptls_load78 = load i64**, i64*** %2, align 8, !tbaa !263
%3 = getelementptr inbounds i64*, i64** %ptls_load78, i64 2
%safepoint = load i64*, i64** %3, align 8, !tbaa !267
fence syncscope("singlethread") seq_cst
call void @julia.safepoint(i64* %safepoint), !dbg !4416
fence syncscope("singlethread") seq_cst
%4 = call nonnull {}* @julia.pointer_from_objref({} addrspace(11)* noundef addrspacecast ({}* inttoptr (i64 138703909114864 to {}*) to {} addrspace(11)*)) #311, !dbg !4417
%ptr.i = bitcast {}* %4 to i32*, !dbg !4421
%rv.i = atomicrmw add i32* %ptr.i, i32 1 acq_rel, align 4, !dbg !4421
%5 = and i32 %rv.i, 3, !dbg !4424
%.not = icmp eq i32 %5, 0, !dbg !4432
%narrow = select i1 %.not, i32 4, i32 %5, !dbg !4434
%6 = zext i32 %narrow to i64, !dbg !4434
%7 = load i64, i64* inttoptr (i64 138703909115008 to i64*), align 128, !dbg !4436, !tbaa !455, !alias.scope !327, !noalias !328
%8 = add nsw i64 %6, -1, !dbg !4449
%.not9 = icmp ult i64 %8, %7, !dbg !4452
br i1 %.not9, label %L40, label %L49, !dbg !4446
L40: ; preds = %top
%9 = load {} addrspace(10)**, {} addrspace(10)*** inttoptr (i64 138703909114992 to {} addrspace(10)***), align 16, !dbg !4454, !tbaa !324, !alias.scope !327, !noalias !328
%10 = load {} addrspace(10)*, {} addrspace(10)** inttoptr (i64 138703909115000 to {} addrspace(10)**), align 8, !dbg !4454, !tbaa !324, !alias.scope !327, !noalias !328, !dereferenceable_or_null !329, !align !330
%11 = call {} addrspace(10)* addrspace(13)* @julia.gc_loaded({} addrspace(10)* noundef %10, {} addrspace(10)** noundef %9), !dbg !4457
%12 = bitcast {} addrspace(10)* addrspace(13)* %11 to [3 x [2 x {} addrspace(10)*]] addrspace(13)*, !dbg !4457
%13 = getelementptr inbounds [3 x [2 x {} addrspace(10)*]], [3 x [2 x {} addrspace(10)*]] addrspace(13)* %12, i64 %8, i64 0, i64 0, !dbg !4457
%14 = load {} addrspace(10)*, {} addrspace(10)* addrspace(13)* %13, align 8, !dbg !4457, !tbaa !331, !alias.scope !334, !noalias !335
%.not24 = icmp eq {} addrspace(10)* %14, null, !dbg !4457
br i1 %.not24, label %L49, label %pass3, !dbg !4448
L49: ; preds = %L40, %top
call fastcc void @julia_create_synchronization_worker_88807(i64 signext %6), !dbg !4459
%.pre = load {} addrspace(10)**, {} addrspace(10)*** inttoptr (i64 138703909114992 to {} addrspace(10)***), align 16, !dbg !4460, !tbaa !324, !alias.scope !327, !noalias !328
%.pre25 = load {} addrspace(10)*, {} addrspace(10)** inttoptr (i64 138703909115000 to {} addrspace(10)**), align 8, !dbg !4460, !tbaa !324, !alias.scope !327, !noalias !328
%.pre26 = call {} addrspace(10)* addrspace(13)* @julia.gc_loaded({} addrspace(10)* noundef %.pre25, {} addrspace(10)** noundef %.pre), !dbg !4460
%.pre27 = bitcast {} addrspace(10)* addrspace(13)* %.pre26 to [3 x [2 x {} addrspace(10)*]] addrspace(13)*, !dbg !4460
%.unpack.elt.phi.trans.insert = getelementptr inbounds [3 x [2 x {} addrspace(10)*]], [3 x [2 x {} addrspace(10)*]] addrspace(13)* %.pre27, i64 %8, i64 0, i64 0
%.unpack.unpack.pre = load {} addrspace(10)*, {} addrspace(10)* addrspace(13)* %.unpack.elt.phi.trans.insert, align 8, !dbg !4460, !tbaa !331, !alias.scope !334, !noalias !335
%.not23 = icmp eq {} addrspace(10)* %.unpack.unpack.pre, null, !dbg !4460
br i1 %.not23, label %fail2, label %pass3, !dbg !4460
L75: ; preds = %pass3
call fastcc void @julia_throw_api_error_86996(i32 zeroext %18) #312, !dbg !4463
unreachable, !dbg !4463
L77: ; preds = %pass3
ret void, !dbg !4464
fail2: ; preds = %L49
%15 = load {}*, {}** @jl_undefref_exception, align 8, !dbg !4460, !tbaa !267, !alias.scope !309, !noalias !312, !nonnull !262
%16 = addrspacecast {}* %15 to {} addrspace(12)*, !dbg !4460
call void @ijl_throw({} addrspace(12)* %16) #312, !dbg !4460
unreachable, !dbg !4460
pass3: ; preds = %L40, %L49
%nodecayed..pre-phi2834 = phi {} addrspace(10)*
%nodecayedoff..pre-phi2834 = phi i64
%.pre-phi2834 = phi [3 x [2 x {} addrspace(10)*]] addrspace(13)* [ %.pre27, %L49 ], [ %12, %L40 ]
%.unpack.unpack33 = phi {} addrspace(10)* [ %.unpack.unpack.pre, %L49 ], [ %14, %L40 ]
%.unpack.elt14 = getelementptr inbounds [3 x [2 x {} addrspace(10)*]], [3 x [2 x {} addrspace(10)*]] addrspace(13)* %.pre-phi2834, i64 %8, i64 0, i64 1, !dbg !4460
%.unpack.unpack15 = load {} addrspace(10)*, {} addrspace(10)* addrspace(13)* %.unpack.elt14, align 8, !dbg !4460, !tbaa !331, !alias.scope !334, !noalias !335
%.unpack11.elt = getelementptr inbounds [3 x [2 x {} addrspace(10)*]], [3 x [2 x {} addrspace(10)*]] addrspace(13)* %.pre-phi2834, i64 %8, i64 1, i64 0, !dbg !4460
%.unpack11.unpack = load {} addrspace(10)*, {} addrspace(10)* addrspace(13)* %.unpack11.elt, align 8, !dbg !4460, !tbaa !331, !alias.scope !334, !noalias !335
%.unpack11.elt17 = getelementptr inbounds [3 x [2 x {} addrspace(10)*]], [3 x [2 x {} addrspace(10)*]] addrspace(13)* %.pre-phi2834, i64 %8, i64 1, i64 1, !dbg !4460
%.unpack11.unpack18 = load {} addrspace(10)*, {} addrspace(10)* addrspace(13)* %.unpack11.elt17, align 8, !dbg !4460, !tbaa !331, !alias.scope !334, !noalias !335
%.unpack13.elt = getelementptr inbounds [3 x [2 x {} addrspace(10)*]], [3 x [2 x {} addrspace(10)*]] addrspace(13)* %.pre-phi2834, i64 %8, i64 2, i64 0, !dbg !4460
%.unpack13.unpack = load {} addrspace(10)*, {} addrspace(10)* addrspace(13)* %.unpack13.elt, align 8, !dbg !4460, !tbaa !331, !alias.scope !334, !noalias !335
%.unpack13.elt20 = getelementptr inbounds [3 x [2 x {} addrspace(10)*]], [3 x [2 x {} addrspace(10)*]] addrspace(13)* %.pre-phi2834, i64 %8, i64 2, i64 1, !dbg !4460
%.unpack13.unpack21 = load {} addrspace(10)*, {} addrspace(10)* addrspace(13)* %.unpack13.elt20, align 8, !dbg !4460, !tbaa !331, !alias.scope !334, !noalias !335
%.fca.0.0.gep = getelementptr inbounds [3 x [2 x {} addrspace(10)*]], [3 x [2 x {} addrspace(10)*]]* %1, i64 0, i64 0, i64 0, !dbg !4465
store {} addrspace(10)* %.unpack.unpack33, {} addrspace(10)** %.fca.0.0.gep, align 8, !dbg !4465, !noalias !340
%.fca.0.1.gep = getelementptr inbounds [3 x [2 x {} addrspace(10)*]], [3 x [2 x {} addrspace(10)*]]* %1, i64 0, i64 0, i64 1, !dbg !4465
store {} addrspace(10)* %.unpack.unpack15, {} addrspace(10)** %.fca.0.1.gep, align 8, !dbg !4465, !noalias !340
%.fca.1.0.gep = getelementptr inbounds [3 x [2 x {} addrspace(10)*]], [3 x [2 x {} addrspace(10)*]]* %1, i64 0, i64 1, i64 0, !dbg !4465
store {} addrspace(10)* %.unpack11.unpack, {} addrspace(10)** %.fca.1.0.gep, align 8, !dbg !4465, !noalias !340
%.fca.1.1.gep = getelementptr inbounds [3 x [2 x {} addrspace(10)*]], [3 x [2 x {} addrspace(10)*]]* %1, i64 0, i64 1, i64 1, !dbg !4465
store {} addrspace(10)* %.unpack11.unpack18, {} addrspace(10)** %.fca.1.1.gep, align 8, !dbg !4465, !noalias !340
%.fca.2.0.gep = getelementptr inbounds [3 x [2 x {} addrspace(10)*]], [3 x [2 x {} addrspace(10)*]]* %1, i64 0, i64 2, i64 0, !dbg !4465
store {} addrspace(10)* %.unpack13.unpack, {} addrspace(10)** %.fca.2.0.gep, align 8, !dbg !4465, !noalias !340
%.fca.2.1.gep = getelementptr inbounds [3 x [2 x {} addrspace(10)*]], [3 x [2 x {} addrspace(10)*]]* %1, i64 0, i64 2, i64 1, !dbg !4465
store {} addrspace(10)* %.unpack13.unpack21, {} addrspace(10)** %.fca.2.1.gep, align 8, !dbg !4465, !noalias !340
%17 = addrspacecast [3 x [2 x {} addrspace(10)*]]* %1 to [3 x [2 x {} addrspace(10)*]] addrspace(11)*, !dbg !4465
%18 = call fastcc i32 @julia_put__88773([3 x [2 x {} addrspace(10)*]] addrspace(11)* nocapture noundef nonnull readonly align 8 dereferenceable(48) %17, {} addrspace(10)* noundef nonnull align 8 dereferenceable(40) %0), !dbg !4465
%19 = icmp eq i32 %18, 0, !dbg !4466
br i1 %19, label %L77, label %L75, !dbg !4471
}
Could not analyze garbage collection behavior of
inst: %.pre-phi2834 = phi [3 x [2 x {} addrspace(10)*]] addrspace(13)* [ %.pre27, %L49 ], [ %12, %L40 ]
v0: %.pre27 = bitcast {} addrspace(10)* addrspace(13)* %.pre26 to [3 x [2 x {} addrspace(10)*]] addrspace(13)*, !dbg !343
v: {} addrspace(10)*** inttoptr (i64 138703909114992 to {} addrspace(10)***)
offset: i64 0
hasload: true
Stacktrace:
[1] #synchronize#1003
@ ~/.julia/packages/CUDA/1kIOw/lib/cudadrv/synchronization.jl:200
[2] synchronize (repeats 2 times)
@ ~/.julia/packages/CUDA/1kIOw/lib/cudadrv/synchronization.jl:194
[3] synchronize
@ ~/.julia/packages/CUDA/1kIOw/src/CUDAKernels.jl:29
[4] augmented_primal
@ ~/.julia/packages/KernelAbstractions/0r40T/ext/EnzymeExt.jl:61 |
Hi! First of all, thank you very much for this amazing package 😄 I have been struggling to make this simple example work in Julia 1.11.2 (it works in Julia 1.10.7):
When running this code I get:
The text was updated successfully, but these errors were encountered: