Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Performance regression for BitMatrix multiplication in 1.11.2 #56954

Open
jonocarroll opened this issue Jan 5, 2025 · 4 comments
Open

Performance regression for BitMatrix multiplication in 1.11.2 #56954

jonocarroll opened this issue Jan 5, 2025 · 4 comments
Labels
arrays [a, r, r, a, y, s] linear algebra Linear algebra performance Must go faster regression 1.11 Regression in the 1.11 release

Comments

@jonocarroll
Copy link

jonocarroll commented Jan 5, 2025

I believe there might be a significant performance regression between 1.11.1 and 1.11.2. I encountered this after upgrading and have managed to pin it down as far as matrix multiplying two large BitMatrix objects.

I found the following (after running the multiplication a couple of times already):

1.11.0:

a = BitMatrix(undef, (3000, 3000));
@time a * a
  0.014537 seconds (3 allocations: 68.665 MiB, 13.85% gc time)

1.11.1:

a = BitMatrix(undef, (3000, 3000));
@time a * a
  0.018001 seconds (3 allocations: 68.665 MiB, 19.69% gc time)

1.11.2:

a = BitMatrix(undef, (3000, 3000));
@time a * a
 11.244051 seconds (3 allocations: 68.665 MiB, 0.02% gc time)

A significant decrease in gc time, but vastly outweighed by the runtime.

I did run some profiling on a fuller example (where I encountered this) and found a large increase associated with a setindex!

1.11.0:

   ╎    ╎    ╎    ╎    ╎    ╎   10  …c/matmul.jl:114; *
   ╎    ╎    ╎    ╎    ╎    ╎    1   …c/matmul.jl:117; matprod_dest
   ╎    ╎    ╎    ╎    ╎    ╎     1   …bitarray.jl:375; similar
   ╎    ╎    ╎    ╎    ╎    ╎    ╎ 1   …ase/boot.jl:599; Array
   ╎    ╎    ╎    ╎    ╎    ╎    ╎  1   …ase/boot.jl:592; Array
   ╎    ╎    ╎    ╎    ╎    ╎    ╎   1   …ase/boot.jl:582; Array
   ╎    ╎    ╎    ╎    ╎    ╎    ╎    1   …ase/boot.jl:535; new_as_memoryref
  1╎    ╎    ╎    ╎    ╎    ╎    ╎     1   …ase/boot.jl:516; GenericMemory
   ╎    ╎    ╎    ╎    ╎    ╎    9   …c/matmul.jl:253; mul!
   ╎    ╎    ╎    ╎    ╎    ╎     9   …c/matmul.jl:285; mul!
   ╎    ╎    ╎    ╎    ╎    ╎    ╎ 9   …c/matmul.jl:287; _mul!
   ╎    ╎    ╎    ╎    ╎    ╎    ╎  9   …c/matmul.jl:868; generic_matmatmul!
   ╎    ╎    ╎    ╎    ╎    ╎    ╎   2   …c/matmul.jl:892; _generic_matmatmul!(…
   ╎    ╎    ╎    ╎    ╎    ╎    ╎    2   …actarray.jl:1312; getindex
   ╎    ╎    ╎    ╎    ╎    ╎    ╎     2   …actarray.jl:1341; _getindex
   ╎    ╎    ╎    ╎    ╎    ╎    ╎    ╎ 2   …bitarray.jl:682; getindex
   ╎    ╎    ╎    ╎    ╎    ╎    ╎    ╎  2   …bitarray.jl:676; unsafe_bitgetind…
  2╎    ╎    ╎    ╎    ╎    ╎    ╎    ╎   2   …sentials.jl:916; getindex
   ╎    ╎    ╎    ╎    ╎    ╎    ╎   5   …c/matmul.jl:893; _generic_matmatmul!(…
   ╎    ╎    ╎    ╎    ╎    ╎    ╎    5   …simdloop.jl:77; macro expansion
   ╎    ╎    ╎    ╎    ╎    ╎    ╎     5   …c/matmul.jl:894; macro expansion
   ╎    ╎    ╎    ╎    ╎    ╎    ╎    ╎ 1   …actarray.jl:1312; getindex
   ╎    ╎    ╎    ╎    ╎    ╎    ╎    ╎  1   …actarray.jl:1341; _getindex
   ╎    ╎    ╎    ╎    ╎    ╎    ╎    ╎   1   …actarray.jl:1347; _to_linear_ind…
   ╎    ╎    ╎    ╎    ╎    ╎    ╎    ╎    1   …actarray.jl:3048; _sub2ind
   ╎    ╎    ╎    ╎    ╎    ╎    ╎    ╎     1   …actarray.jl:98; axes
   ╎    ╎    ╎    ╎    ╎    ╎    ╎    ╎  +1 1   …bitarray.jl:105; size
  1╎    ╎    ╎    ╎    ╎    ╎    ╎    ╎  +2 1   …ase/Base.jl:49; getproperty
  4╎    ╎    ╎    ╎    ╎    ╎    ╎    ╎ 4   …se/array.jl:983; setindex!
   ╎    ╎    ╎    ╎    ╎    ╎    ╎   2   …c/matmul.jl:896; _generic_matmatmul!(…	
  2╎    ╎    ╎    ╎    ╎    ╎    ╎    2   …se/range.jl:908; iterate

1.11.2

     ╎    ╎    ╎    ╎    ╎    ╎   15355 …c/matmul.jl:114; *
     ╎    ╎    ╎    ╎    ╎    ╎    1     …c/matmul.jl:117; matprod_dest
     ╎    ╎    ╎    ╎    ╎    ╎     1     …bitarray.jl:375; similar
     ╎    ╎    ╎    ╎    ╎    ╎    ╎ 1     …ase/boot.jl:599; Array
     ╎    ╎    ╎    ╎    ╎    ╎    ╎  1     …ase/boot.jl:592; Array
     ╎    ╎    ╎    ╎    ╎    ╎    ╎   1     …ase/boot.jl:582; Array
     ╎    ╎    ╎    ╎    ╎    ╎    ╎    1     …ase/boot.jl:535; new_as_memoryref
    1╎    ╎    ╎    ╎    ╎    ╎    ╎     1     …ase/boot.jl:516; GenericMemory
     ╎    ╎    ╎    ╎    ╎    ╎    15354 …c/matmul.jl:253; mul!
     ╎    ╎    ╎    ╎    ╎    ╎     15354 …c/matmul.jl:285; mul!
     ╎    ╎    ╎    ╎    ╎    ╎    ╎ 15354 …c/matmul.jl:287; _mul!
     ╎    ╎    ╎    ╎    ╎    ╎    ╎  15354 …c/matmul.jl:868; generic_matmatmul!
     ╎    ╎    ╎    ╎    ╎    ╎    ╎   15280 …c/matmul.jl:895; _generic_matmatm…
     ╎    ╎    ╎    ╎    ╎    ╎    ╎    15280 …simdloop.jl:77; macro expansion
     ╎    ╎    ╎    ╎    ╎    ╎    ╎     15280 …c/matmul.jl:896; macro expansion
     ╎    ╎    ╎    ╎    ╎    ╎    ╎    ╎ 535   …actarray.jl:1312; getindex
     ╎    ╎    ╎    ╎    ╎    ╎    ╎    ╎  535   …actarray.jl:1341; _getindex
     ╎    ╎    ╎    ╎    ╎    ╎    ╎    ╎   404   …actarray.jl:1347; _to_linear…
     ╎    ╎    ╎    ╎    ╎    ╎    ╎    ╎    404   …actarray.jl:3048; _sub2ind
     ╎    ╎    ╎    ╎    ╎    ╎    ╎    ╎     404   …actarray.jl:98; axes
     ╎    ╎    ╎    ╎    ╎    ╎    ╎    ╎  +1 404   …bitarray.jl:105; size
  404╎    ╎    ╎    ╎    ╎    ╎    ╎    ╎  +2 404   …ase/Base.jl:49; getproperty
     ╎    ╎    ╎    ╎    ╎    ╎    ╎    ╎   131   …bitarray.jl:682; getindex
     ╎    ╎    ╎    ╎    ╎    ╎    ╎    ╎    131   …bitarray.jl:676; unsafe_bit…
  131╎    ╎    ╎    ╎    ╎    ╎    ╎    ╎     131   …sentials.jl:917; getindex
     ╎    ╎    ╎    ╎    ╎    ╎    ╎    ╎ 502   …se/array.jl:930; getindex
  502╎    ╎    ╎    ╎    ╎    ╎    ╎    ╎  502   …sentials.jl:917; getindex
14243╎    ╎    ╎    ╎    ╎    ╎    ╎    ╎ 14243 …se/array.jl:994; setindex!
     ╎    ╎    ╎    ╎    ╎    ╎    ╎   74    …c/matmul.jl:898; _generic_matmatm…
   74╎    ╎    ╎    ╎    ╎    ╎    ╎    74    …se/range.jl:908; iterate

The difference doesn't seem to appear for a pair of similarly sized Matrix{Int}

1.11.1:

a = Matrix{Int}(undef, 3000, 3000);
@time a * a'
  7.678765 seconds (4 allocations: 68.665 MiB, 0.45% gc time)

1.11.2:

a = Matrix{Int}(undef, 3000, 3000);
@time a * a'
  7.780284 seconds (4 allocations: 68.665 MiB, 0.49% gc time)

Hopefully someone can reproduce this.

My system:

Platform Info:
  OS: macOS (arm64-apple-darwin24.0.0)
  CPU: 11 × Apple M3 Pro
  WORD_SIZE: 64
  LLVM: libLLVM-16.0.6 (ORCJIT, apple-m2)
@jishnub jishnub added performance Must go faster regression 1.11 Regression in the 1.11 release labels Jan 5, 2025
@nsajko nsajko added linear algebra Linear algebra arrays [a, r, r, a, y, s] bisect wanted and removed bisect wanted labels Jan 5, 2025
@Zentrik
Copy link
Member

Zentrik commented Jan 5, 2025

I bisected to one of 0bd77f5...b28fbd0

@Zentrik
Copy link
Member

Zentrik commented Jan 5, 2025

Bisected on master branch to 0af99e6

@giordano
Copy link
Contributor

giordano commented Jan 5, 2025

CC @jishnub (author of #56089)

@jishnub
Copy link
Contributor

jishnub commented Jan 6, 2025

I'm afk at the moment, but does reverting the change resolve the performance regression?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
arrays [a, r, r, a, y, s] linear algebra Linear algebra performance Must go faster regression 1.11 Regression in the 1.11 release
Projects
None yet
Development

No branches or pull requests

5 participants