Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Display loss recovery attempt for track-wlroots 0.18 branch #2458

Open
kode54 opened this issue Aug 29, 2024 · 14 comments
Open

Display loss recovery attempt for track-wlroots 0.18 branch #2458

kode54 opened this issue Aug 29, 2024 · 14 comments

Comments

@kode54
Copy link
Contributor

kode54 commented Aug 29, 2024

Here is my attempt at display loss recovery implementation for wlroots 0.18:

https://gist.github.com/kode54/58b9e30ed73f82e1cfb040fe84f36c66

It doesn't work so well.

Last attempt crashes with this backtrace:

#0  __pthread_kill_implementation (threadid=<optimized out>, signo=signo@entry=6, 
    no_tid=no_tid@entry=0) at pthread_kill.c:44
44	     return INTERNAL_SYSCALL_ERROR_P (ret) ? INTERNAL_SYSCALL_ERRNO (ret) : 0;

#0  __pthread_kill_implementation
    (threadid=<optimized out>, signo=signo@entry=6, no_tid=no_tid@entry=0) at pthread_kill.c:44
#1  0x0000790b680a5463 in __pthread_kill_internal (threadid=<optimized out>, signo=6)
    at pthread_kill.c:78
#2  0x0000790b6804c120 in __GI_raise (sig=sig@entry=6) at ../sysdeps/posix/raise.c:26
#3  0x0000790b680334c3 in __GI_abort () at abort.c:79
#4  0x0000790b680333df in __assert_fail_base
    (fmt=0x790b681c3c20 "%s%s%s:%u: %s%sAssertion `%s' failed.\n%n", assertion=assertion@entry=0x5aacbd5146cd "handle || is_shutting_down()", file=file@entry=0x5aacbd5143de "../src/core/output-layout.cpp", line=line@entry=1709, function=function@entry=0x5aacbd517ae0 "wf::output_t* wf::output_layout_t::impl::get_output_coords_at(const wf::pointf_t&, wf::pointf_t&)") at assert.c:94
#5  0x0000790b68044177 in __assert_fail
    (assertion=0x5aacbd5146cd "handle || is_shutting_down()", file=0x5aacbd5143de "../src/core/output-layout.cpp", line=1709, function=0x5aacbd517ae0 "wf::output_t* wf::output_layout_t::impl::get_output_coords_at(const wf::pointf_t&, wf::pointf_t&)") at assert.c:103
#6  0x00005aacbd45f7a8 in wf::output_layout_t::impl::get_output_coords_at(wf::pointf_t const&, wf::pointf_t&) [clone .part.0] [clone .lto_priv.0]
    (closest=<optimized out>, origin=<optimized out>, this=<optimized out>)
    at ../src/core/output-layout.cpp:1709
#7  0x00005aacbd4782c0 in wf::output_layout_t::impl::get_output_coords_at
    (origin=<synthetic pointer>..., this=0x5aace2698960, closest=...) at ../src/core/core.cpp:297
#8  wf::output_layout_t::get_output_coords_at (this=<optimized out>, origin=..., closest=...)
    at ../src/core/output-layout.cpp:1762
#9  wf::compositor_core_impl_t::reconfigure_outputs (this=0x5aace0cd5430)
    at ../src/core/core.cpp:239
#10 0x00005aacbd513796 in main::{lambda(void*)#1}::operator()(void*) const [clone .isra.0]
    (__closure=0x5aace18aca20, data=<optimized out>) at ../src/main.cpp:458
#11 0x00005aacbd442c82 in std::function<void(void*)>::operator()
    (this=<optimized out>, __args#0=<optimized out>)
    at /usr/include/c++/14.2.1/bits/std_function.h:591
#12 wf::wl_listener_wrapper::emit (this=<optimized out>, data=<optimized out>)
    at ../src/wl-listener-wrapper.tpp:57
#13 wf::handle_wrapped_listener (listener=<optimized out>, data=<optimized out>)
    at ../src/wl-listener-wrapper.tpp:10
#14 0x0000790b68a0342e in wl_signal_emit_mutable
    (signal=signal@entry=0x5aace0fc13b8, data=data@entry=0x0)
    at ../wayland-1.23.0/src/wayland-server.c:2314
#15 0x0000790b68913b5f in begin_gles2_buffer_pass
    (buffer=0x5aace1e82560, prev_ctx=0x7ffc170a37a0, timer=0x0)
    at ../wlroots-hidpi-xprop/render/gles2/pass.c:258
#16 gles2_begin_buffer_pass
    (wlr_renderer=<optimized out>, wlr_buffer=0x5aace1e4eb30, options=<optimized out>)
    at ../wlroots-hidpi-xprop/render/gles2/renderer.c:262
#17 0x0000790b6890ce35 in wlr_renderer_begin_buffer_pass
    (renderer=<optimized out>, buffer=<optimized out>, options=<optimized out>)
    at ../wlroots-hidpi-xprop/render/wlr_renderer.c:304
#18 0x00005aacbd4f71da in wf::swapchain_damage_manager_t::start_frame (this=0x5aace1745de0)
    at ../src/output/render-manager.cpp:331
#19 wf::render_manager::impl::paint (this=0x5aace1d6b1b0) at ../src/output/render-manager.cpp:1130
#20 0x00005aacbd442ce6 in std::function<void()>::operator() (this=<optimized out>)
    at /usr/include/c++/14.2.1/bits/std_function.h:591
#21 handle_timeout (data=<optimized out>) at ../src/util.cpp:31
#22 0x0000790b68a053a6 in wl_timer_heap_dispatch (timers=0x5aace0cd5388)
    at ../wayland-1.23.0/src/event-loop.c:527
#23 wl_event_loop_dispatch (loop=0x5aace0cd5330, timeout=<optimized out>, timeout@entry=-1)
    at ../wayland-1.23.0/src/event-loop.c:1098
#24 0x0000790b68a0710f in wl_display_run (display=0x5aace0cd5240)
    at ../wayland-1.23.0/src/wayland-server.c:1530
#25 0x00005aacbd4410bb in main (argc=<optimized out>, argv=<optimized out>) at ../src/main.cpp:509

And then it drops to a terminal and fails to restart cage as my login manager, and hangs the GPU completely.

@ammen99
Copy link
Member

ammen99 commented Aug 29, 2024

If you're doing this, you need to make every plugin which has GL state (textures, framebuffers, programs) reload its state as well.

@kode54
Copy link
Contributor Author

kode54 commented Aug 29, 2024

May as well reload them all, then.

To provoke a reset, at least on amdgpu:

# cat /sys/kernel/debug/dri/0/amdgpu_gpu_recover

@ammen99
Copy link
Member

ammen99 commented Sep 1, 2024

May as well reload them all, then.

To provoke a reset, at least on amdgpu:

# cat /sys/kernel/debug/dri/0/amdgpu_gpu_recover

Unloading a plugin might cause losing a lot of temporary state, which is not what we want in the ideal case .. Not to mention some plugins cannot be unloaded safely.

@soreau
Copy link
Member

soreau commented Sep 1, 2024

I can't really think of an 'unloadable' plugin that also does GL stuff. Are there any?

@ammen99
Copy link
Member

ammen99 commented Sep 1, 2024

I can't really think of an 'unloadable' plugin that also does GL stuff. Are there any?

I'd prefer to not make assumptions, maybe such a plugin will come in the future.

@soreau
Copy link
Member

soreau commented Sep 1, 2024

Sure, but I was thinking more along the lines of having no plugins loaded when testing, and if that works, then maybe hinge on unloadable flag for now until it works, then consider adding a new flag.

@kode54
Copy link
Contributor Author

kode54 commented Sep 2, 2024

Maybe instead a notification should be plumbed to plugins that need it, to notify them that they need to free and reallocate their GPU resources? Would be better than forcing a full unload.

@ammen99
Copy link
Member

ammen99 commented Sep 2, 2024

Maybe instead a notification should be plumbed to plugins that need it, to notify them that they need to free and reallocate their GPU resources? Would be better than forcing a full unload.

Yes that's the best solution.

@kode54
Copy link
Contributor Author

kode54 commented Sep 14, 2024

New attempt without any plugins that would have GL, new backtrace:

#0  __pthread_kill_implementation
    (threadid=<optimized out>, signo=signo@entry=6, no_tid=no_tid@entry=0)
    at pthread_kill.c:44
#1  0x00007da0c01b6463 in __pthread_kill_internal (threadid=<optimized out>, signo=6)
    at pthread_kill.c:78
#2  0x00007da0c015d120 in __GI_raise (sig=sig@entry=6) at ../sysdeps/posix/raise.c:26
#3  0x00007da0c01444c3 in __GI_abort () at abort.c:79
#4  0x00007da0c01443df in __assert_fail_base
    (fmt=0x7da0c02d4c20 "%s%s%s:%u: %s%sAssertion `%s' failed.\n%n", assertion=assertion@entry=0x6362c73a2c88 "wlr_texture_is_gles2(texture)", file=file@entry=0x6362c73a291a "../src/core/opengl.cpp", line=line@entry=580, function=function@entry=0x6362c73a65c0 "wf::texture_t::texture_t(wlr_texture*, std::optional<wlr_fbox>)") at assert.c:94
#5  0x00007da0c0155177 in __assert_fail
    (assertion=0x6362c73a2c88 "wlr_texture_is_gles2(texture)", file=0x6362c73a291a "../src/core/opengl.cpp", line=580, function=0x6362c73a65c0 "wf::texture_t::texture_t(wlr_texture*, std::optional<wlr_fbox>)") at assert.c:103
#6  0x00006362c72fa240 in wf::texture_t::texture_t
    (this=this@entry=0x7fff1604d4a0, texture=0x6362f40c78f0, viewport=Python Exception <class 'gdb.error'>: value has been optimized out
..., this=<optimized out>, texture=<optimized out>, viewport=Python Exception <class 'gdb.error'>: value has been optimized out
...) at ../src/core/opengl.cpp:580
#7  0x00006362c7378926 in wf::scene::wlr_surface_node_t::wlr_surface_render_instance_t::render (this=0x6362f41f62f0, target=..., region=...) at ../src/view/wlr-surface-node.cpp:317
#8  0x00006362c73841b3 in wf::scene::render_instance_t::render
    (this=<optimized out>, target=..., region=..., custom_data=std::any [no contained value]) at ../src/api/wayfire/scene-render.hpp:121
#9  wf::scene::run_render_pass (params=..., flags=flags@entry=3)
--Type <RET> for more, q to quit, c to continue without paging--c
    at ../src/output/render-manager.cpp:1227
#10 0x00006362c7385146 in wf::render_manager::impl::render_output (this=0x6362f39fb180)
    at ../src/output/render-manager.cpp:1092
#11 wf::render_manager::impl::paint (this=0x6362f39fb180)
    at ../src/output/render-manager.cpp:1141
#12 0x00006362c7386428 in wf::render_manager::impl::impl(wf::output_t*)::{lambda(void*)#1}::operator()(void*) const (__closure=0x6362f39fb180) at ../src/output/render-manager.cpp:968
#13 std::__invoke_impl<void, wf::render_manager::impl::impl(wf::output_t*)::{lambda(void*)#1}&, void*>(std::__invoke_other, wf::render_manager::impl::impl(wf::output_t*)::{lambda(void*)#1}&, void*&&) (__f=...) at /usr/include/c++/14.2.1/bits/invoke.h:61
#14 std::__invoke_r<void, wf::render_manager::impl::impl(wf::output_t*)::{lambda(void*)#1}&, void*>(wf::render_manager::impl::impl(wf::output_t*)::{lambda(void*)#1}&, void*&&)
    (__fn=...) at /usr/include/c++/14.2.1/bits/invoke.h:111
#15 std::_Function_handler<void (void*), wf::render_manager::impl::impl(wf::output_t*)::{lambda(void*)#1}>::_M_invoke(std::_Any_data const&, void*&&)
    (__functor=..., __args#0=<optimized out>)
    at /usr/include/c++/14.2.1/bits/std_function.h:290
#16 0x00006362c72d0e82 in std::function<void(void*)>::operator()
    (this=<optimized out>, __args#0=<optimized out>)
    at /usr/include/c++/14.2.1/bits/std_function.h:591
#17 wf::wl_listener_wrapper::emit (this=<optimized out>, data=<optimized out>)
    at ../src/wl-listener-wrapper.tpp:57
#18 wf::handle_wrapped_listener (listener=<optimized out>, data=<optimized out>)
    at ../src/wl-listener-wrapper.tpp:10
#19 0x00007da0c0acc47e in wl_signal_emit_mutable
    (signal=<optimized out>, data=0x6362f3928f30)
    at ../wayland-1.23.1/src/wayland-server.c:2314
#20 0x00007da0c0acdefc in wl_event_loop_dispatch_idle (loop=loop@entry=0x6362f27e2330)
    at ../wayland-1.23.1/src/event-loop.c:970
#21 0x00007da0c0ace177 in wl_event_loop_dispatch
    (loop=0x6362f27e2330, timeout=<optimized out>, timeout@entry=-1)
    at ../wayland-1.23.1/src/event-loop.c:1110
#22 0x00007da0c0ad01f7 in wl_display_run (display=0x6362f27e2240)
    at ../wayland-1.23.1/src/wayland-server.c:1530
#23 0x00006362c72cf2db in main (argc=<optimized out>, argv=<optimized out>)
    at ../src/main.cpp:515

@ammen99
Copy link
Member

ammen99 commented Sep 14, 2024

Some wild guesses based on the stacktrace - Wayfire keeps a reference of the surface's texture/buffer:

this->current_buffer = &surface->buffer->base;

Depending on how wlroots has implemented GPU reset handling, maybe they change the texture/buffer pointer? So after the reset, we still hold on to the old texture until a new buffer is committed, but the old texture isn't valid anymore because of the gpu reset?

@DemiMarie
Copy link

Some wild guesses based on the stacktrace - Wayfire keeps a reference of the surface's texture/buffer:

this->current_buffer = &surface->buffer->base;

Depending on how wlroots has implemented GPU reset handling, maybe they change the texture/buffer pointer? So after the reset, we still hold on to the old texture until a new buffer is committed, but the old texture isn't valid anymore because of the gpu reset?

That sounds about right to me. After a GPU reset, old textures are useless, whether the pointers are valid or not.

@soreau
Copy link
Member

soreau commented Nov 10, 2024

After discussing a bit with emersion on IRC, it seems wlroots is emitting the signal to early, before the gpu is ready. According to the spec:

For this extension, the application is expected to query
the reset status until NO_ERROR is returned. If a reset is
encountered, at least one *RESET* status will be returned. Once
NO_ERROR is again encountered, the application can safely destroy the
old context and create a new one.

Since wlroots just queries once for NO_ERROR, and the gpu reset timeout is most likely in the order of seconds, it's very likely emitting the signal too early to reset the context, because the gpu isn't ready yet. You might be able to check for this in your patch with while (core.renderer->procs.glGetGraphicsResetStatusKHR()) != NO_ERROR); at the top of your on_renderer_lost() handler.

@kode54
Copy link
Contributor Author

kode54 commented Nov 11, 2024

New backtrace for you:

#0  0x00007c05111b63f4 in ?? () from /usr/lib/libc.so.6
#1  0x00007c051115d120 in raise () from /usr/lib/libc.so.6
#2  0x00007c05111444c3 in abort () from /usr/lib/libc.so.6
#3  0x00007c05111443df in ?? () from /usr/lib/libc.so.6
#4  0x00007c0511155177 in __assert_fail () from /usr/lib/libc.so.6
#5  0x00005dde0e722508 in wf::output_layout_t::impl::get_output_coords_at(wf::pointf_t const&, wf::pointf_t&) [clone .part.0] [clone .lto_priv.0] (closest=..., origin=..., this=<optimized out>)
    at ../src/core/output-layout.cpp:1726
#6  0x00005dde0e73b060 in wf::output_layout_t::impl::get_output_coords_at (
    origin=<synthetic pointer>..., this=0x5dde1bfb5a80, closest=...) at ../src/core/core.cpp:304
#7  wf::output_layout_t::get_output_coords_at (this=<optimized out>, origin=..., closest=...)
    at ../src/core/output-layout.cpp:1779
#8  wf::compositor_core_impl_t::reconfigure_outputs (this=0x5dde19c8dd30)
    at ../src/core/core.cpp:246
#9  0x00005dde0e7d64b3 in main::{lambda(void*)#1}::operator()(void*) const [clone .isra.0] (
    __closure=0x5dde1b2e1830, data=<optimized out>) at ../src/main.cpp:471
#10 0x00005dde0e705d02 in std::function<void(void*)>::operator() (this=<optimized out>, 
    __args#0=<optimized out>) at /usr/include/c++/14.2.1/bits/std_function.h:591
#11 wf::wl_listener_wrapper::emit (this=<optimized out>, data=<optimized out>)
    at ../src/wl-listener-wrapper.tpp:57
#12 wf::handle_wrapped_listener (listener=<optimized out>, data=<optimized out>)
    at ../src/wl-listener-wrapper.tpp:10
#13 0x00007c0511bc947e in wl_signal_emit_mutable (signal=signal@entry=0x5dde19ef85e8, 
    data=data@entry=0x0) at ../wayland-1.23.1/src/wayland-server.c:2314
#14 0x00007c0511ad9b3f in begin_gles2_buffer_pass (buffer=0x5dde1bc975b0, prev_ctx=0x7ffe1e5b1f80, 
    timer=0x0) at ../wlroots-hidpi-xprop/render/gles2/pass.c:258
#15 gles2_begin_buffer_pass (wlr_renderer=<optimized out>, wlr_buffer=0x5dde1b577760, 
    options=<optimized out>) at ../wlroots-hidpi-xprop/render/gles2/renderer.c:262
#16 0x00007c0511ad2e15 in wlr_renderer_begin_buffer_pass (renderer=<optimized out>, 
    buffer=<optimized out>, options=<optimized out>)
    at ../wlroots-hidpi-xprop/render/wlr_renderer.c:304
#17 0x00005dde0e7b9a3a in wf::swapchain_damage_manager_t::start_frame (this=0x5dde1b1fd1e0)
    at ../src/output/render-manager.cpp:331
#18 wf::render_manager::impl::paint (this=0x5dde1b8b93c0) at ../src/output/render-manager.cpp:1130
#19 0x00005dde0e7bb528 in wf::render_manager::impl::impl(wf::output_t*)::{lambda(void*)#1}::operator()(void*) const (__closure=0x5dde1b8b93c0) at ../src/output/render-manager.cpp:968
#20 std::__invoke_impl<void, wf::render_manager::impl::impl(wf::output_t*)::{lambda(void*)#1}&, void*>(std::__invoke_other, wf::render_manager::impl::impl(wf::output_t*)::{lambda(void*)#1}&, void*&&)
    (__f=...) at /usr/include/c++/14.2.1/bits/invoke.h:61
#21 std::__invoke_r<void, wf::render_manager::impl::impl(wf::output_t*)::{lambda(void*)#1}&, void*>(wf::render_manager::impl::impl(wf::output_t*)::{lambda(void*)#1}&, void*&&) (__fn=...)
    at /usr/include/c++/14.2.1/bits/invoke.h:111
#22 std::_Function_handler<void (void*), wf::render_manager::impl::impl(wf::output_t*)::{lambda(void*)#1}>::_M_invoke(std::_Any_data const&, void*&&) (__functor=..., __args#0=<optimized out>)
    at /usr/include/c++/14.2.1/bits/std_function.h:290
#23 0x00005dde0e705d02 in std::function<void(void*)>::operator() (this=<optimized out>, 
    __args#0=<optimized out>) at /usr/include/c++/14.2.1/bits/std_function.h:591
#24 wf::wl_listener_wrapper::emit (this=<optimized out>, data=<optimized out>)
    at ../src/wl-listener-wrapper.tpp:57
#25 wf::handle_wrapped_listener (listener=<optimized out>, data=<optimized out>)
    at ../src/wl-listener-wrapper.tpp:10
#26 0x00007c0511bc947e in wl_signal_emit_mutable (signal=<optimized out>, data=0x5dde1b7f8950)
    at ../wayland-1.23.1/src/wayland-server.c:2314
#27 0x00007c0511bcaefc in wl_event_loop_dispatch_idle (loop=loop@entry=0x5dde19c8dc30)
    at ../wayland-1.23.1/src/event-loop.c:970
#28 0x00007c0511bcb177 in wl_event_loop_dispatch (loop=0x5dde19c8dc30, timeout=<optimized out>, 
    timeout@entry=-1) at ../wayland-1.23.1/src/event-loop.c:1110
#29 0x00007c0511bcd1f7 in wl_display_run (display=0x5dde19c8db40)
    at ../wayland-1.23.1/src/wayland-server.c:1530
#30 0x00005dde0e704169 in main (argc=<optimized out>, argv=<optimized out>) at ../src/main.cpp:522

@soreau
Copy link
Member

soreau commented Jan 10, 2025

This appears to be part of a working implementation for sway.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants