-
Notifications
You must be signed in to change notification settings - Fork 6
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Test removal of LoopVectorization.jl #83
Comments
on my machine, without
|
I do see a difference in micro benchmark: julia> using LoopVectorization, Chairmarks
julia> function fast_findmin(dij, n)
# findmin(@inbounds @view dij[1:n])
best = 1
@inbounds dij_min = dij[1]
@turbo for here in 2:n
newmin = dij[here] < dij_min
best = newmin ? here : best
dij_min = newmin ? dij[here] : dij_min
end
dij_min, best
end
julia> function setup_bench()
N = rand(200:500)
dij = abs.(rand(N))
first_n = rand(1:N)
dij, first_n
end^C
julia> function basic_findmin(dij, n)
# findmin(@inbounds @view dij[1:n])
best = 1
@inbounds dij_min = dij[1]
for here in 2:n
newmin = dij[here] < dij_min
best = newmin ? here : best
dij_min = newmin ? dij[here] : dij_min
end
dij_min, best
end
julia> @be setup_bench fast_findmin(_...) evals=1
Benchmark: 24004 samples with 1 evaluation
min 50.000 ns
median 90.000 ns
mean 92.675 ns
max 681.000 ns
julia> @be setup_bench fast_findmin(_...) evals=1
Benchmark: 20405 samples with 1 evaluation
min 50.000 ns
median 90.000 ns
mean 92.634 ns
max 1.442 μs
julia> @be setup_bench basic_findmin(_...) evals=1
Benchmark: 7178 samples with 1 evaluation
min 40.000 ns
median 240.000 ns
mean 255.817 ns
max 782.000 ns
julia> @be setup_bench basic_findmin(_...) evals=1
Benchmark: 25737 samples with 1 evaluation
min 40.000 ns
median 241.000 ns
mean 260.301 ns
max 14.146 μs |
Hi @Moelf - yes, this has been on my TODO as well, as we might loose Just running that now on my M2 Pro I get the following: Julia
|
yeah I can reproduce that, I also opened: https://discourse.julialang.org/t/faster-findmin-without-loopvectorization-jl/121742/3 |
I saw - thanks. I will pitch in there later. |
function naive_findmin(dij, n)
x = @fastmath foldl(min, dij)
i = findfirst(==(x), dij)::Int
x, i
end looks quite promising |
Posted to Discourse, but also for here... Thanks @jling for bringing this up. I have been very interested in the discussion. Just to give a bit of context here, in this problem the metric This is why the realistic test is to take a random array of length, say, 400. Find the minimum, then reduce to 399, 398, 397, ..., 1. Adding some update to the array values each iteration makes it even more realistic (prevents unrealistic caching). This is my wrapper to test any findmin implementation function run_descent(v::DenseVector{Float64}, f::T; perturb = 0) where T
# Ensure we do something with the calculation to prevent the
# compiler from optimizing everything away!
sum = 0.0
for n in length(v):-1:2
val, _ = f(v, n)
sum += val
# If one wants to perturb the array do it like this, which
# is a proxy for changing values as the algorithm progresses.
for _ in 1:min(perturb,n)
v[rand(1:n)] = abs(rand())
end
end
sum
end I introduced the trick with I wrapped up all of the suggestions so far into a script and this is what I get on two machines: OS X 14.7, M2 Pro
Alma9, Ryzen 7 5700G
The platform differences are revealing, particularly that the Nothing is quite beating the
Microbenchmarks can be misleading so my next test is in the real code paths of |
@aoanla was kind enough to test these updated
|
Since LV.jl is going away in 1.12, maybe we should attempt to remove it.
I'm looking at https://github.com/graeme-a-stewart/JetReconstructionBenchmarks.jl/ and want to find a small set of benchmarks I can run to monitor the performance when making changes, @graeme-a-stewart can you help to point out a minimal set of things to run, using that repo? It's not completely clear what to run (e.g. example snippets)
The text was updated successfully, but these errors were encountered: