Re: [Regression] CPU stalls and eventually causes a complete system freeze with 6.0.3 due to "video/aperture: Disable and unregister sysfb devices via aperture helpers"

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi

Am 23.10.22 um 10:04 schrieb Thorsten Leemhuis:
Hi, this is your Linux kernel regression tracker speaking.

I noticed a regression report in bugzilla.kernel.org. As many (most?)
kernel developer don't keep an eye on it, I decided to forward it by
mail. Quoting from https://bugzilla.kernel.org/show_bug.cgi?id=216616  :

  Andreas 2022-10-22 14:25:32 UTC

Created attachment 303074 [details]
dmesg

I've looked at the kernel log and found that simpledrm has been loaded *after* amdgpu, which should never happen. The problematic patch has been taken from a long list of refactoring work on this code. No wonder that it doesn't work as expected.

Please cherry-pick commit 9d69ef183815 ("fbdev/core: Remove remove_conflicting_pci_framebuffers()") into the 6.0 stable branch and report on the results. It should fix the problem.

Best regards
Thomas



6.0.2 works.

On 6.0.3 the system is very sluggish with graphic glitches all over the place in KDE Plasma Desktop X11 (no graphic glitches when using Wayland, but also sluggish). SDDM works fine.

Hardware: Lenovo Legion 5 Pro 16ACH6H: AMD Ryzen 7 5800H "Cezanne", hybrid graphics AMD "Green Sardine" (Vega 8 GCN 5.1, AMDGPU) and Nvidia GeForce RTX 3070 Mobile (GA104M, not working with nouveau, I'm not using the proprietary nvidia driver).

[reply] [−] Comment 1 Andreas 2022-10-22 14:27:15 UTC

Created attachment 303075 [details]
my kernel .config for 6.0.3

Only was CONFIG_HID_TOPRE added in 6.0.3, otherwise it is identical as my .config for 6.0.2.

[reply] [−] Comment 2 Andreas 2022-10-22 14:51:23 UTC

In /var/log/Xorg.0.log the only obvious difference is the last line:
---- snap
randr: falling back to unsynchronized pixmap sharing
---- snap
The line is present when I boot with 6.0.3, but isn't when I boot 6.0.2.

(Obviously this is when I login to KDE with X11, not with Wayland, from SDDM.)

[reply] [−] Comment 3 Andreas 2022-10-22 22:10:19 UTC

I did a git bisect on stable kernels 5.0.3 as bad and 5.0.2 as good, this is the result:

cfecfc98a78d97a49807531b5b224459bda877de is the first bad commit
commit cfecfc98a78d97a49807531b5b224459bda877de (HEAD, refs/bisect/bad)
Author: Thomas Zimmermann <tzimmermann@xxxxxxx>
Date:   Mon Jul 18 09:23:18 2022 +0200

     video/aperture: Disable and unregister sysfb devices via aperture helpers
[ Upstream commit 5e01376124309b4dbd30d413f43c0d9c2f60edea ] Call sysfb_disable() before removing conflicting devices in aperture
     helpers. Fixes sysfb state if fbdev has been disabled.
Signed-off-by: Thomas Zimmermann <tzimmermann@xxxxxxx>
     Reviewed-by: Javier Martinez Canillas <javierm@xxxxxxxxxx>
     Fixes: fb84efa28a48 ("drm/aperture: Run fbdev removal before internal helpers")

[reply] [−] Comment 4 Andreas 2022-10-22 22:11:51 UTC

Link to the suspect patch:

https://patchwork.freedesktop.org/patch/msgid/20220718072322.8927-8-tzimmermann@xxxxxxx
(or https://patchwork.freedesktop.org/patch/494608/)

[reply] [−] Comment 5 Andreas 2022-10-22 22:38:14 UTC

Okay, so I reverted v2-07-11-video-aperture-Disable-and-unregister-sysfb-devices-via-aperture-helpers.patch on stable 5.0.3 and the fault is gone.

I always logged out immediately, which worked (even though everything is very very sluggish). Also, when I killed the X session within a couple of seconds (15 or so), no error was shown (I used "systemctl stop sddm" from another virtual console).

Noteworthy: I once compiled a kernel from within the Plasma Desktop, while it was sluggish. The kernel compiled alright. When it was finished I moved the mouse to reboot, at which point it completely froze and I had to hard-reset the system.

While still running, after > 15 seconds, the fault looked like this (dmesg):
---- snap ----
rcu: INFO: rcu_sched detected expedited stalls on CPUs/tasks: { 13-.... } 7 jiffies s: 165 root: 0x2000/.
rcu: blocking rcu_node structures (internal RCU debug):
Task dump for CPU 13:
task:X               state:R  running task     stack:    0 pid: 4242 ppid:  4228 flags:0x00000008
Call Trace:
  <TASK>
  ? commit_tail+0xd7/0x130
  ? drm_atomic_helper_commit+0x126/0x150
  ? drm_atomic_commit+0xa4/0xe0
  ? drm_plane_get_damage_clips.cold+0x1c/0x1c
  ? drm_atomic_helper_dirtyfb+0x19e/0x280
  ? drm_mode_dirtyfb_ioctl+0x10f/0x1e0
  ? drm_mode_getfb2_ioctl+0x2d0/0x2d0
  ? drm_ioctl_kernel+0xc4/0x150
  ? drm_ioctl+0x246/0x3f0
  ? drm_mode_getfb2_ioctl+0x2d0/0x2d0
  ? __x64_sys_ioctl+0x91/0xd0
  ? do_syscall_64+0x60/0xd0
  ? entry_SYSCALL_64_after_hwframe+0x4b/0xb5
  </TASK>
rcu: INFO: rcu_sched detected expedited stalls on CPUs/tasks: { 13-.... } 29 jiffies s: 165 root: 0x2000/.
rcu: blocking rcu_node structures (internal RCU debug):
Task dump for CPU 13:
task:X               state:R  running task     stack:    0 pid: 4242 ppid:  4228 flags:0x00000008
Call Trace:
  <TASK>
  ? commit_tail+0xd7/0x130
  ? drm_atomic_helper_commit+0x126/0x150
  ? drm_atomic_commit+0xa4/0xe0
  ? drm_plane_get_damage_clips.cold+0x1c/0x1c
  ? drm_atomic_helper_dirtyfb+0x19e/0x280
  ? drm_mode_dirtyfb_ioctl+0x10f/0x1e0
  ? drm_mode_getfb2_ioctl+0x2d0/0x2d0
  ? drm_ioctl_kernel+0xc4/0x150
  ? drm_ioctl+0x246/0x3f0
  ? drm_mode_getfb2_ioctl+0x2d0/0x2d0
  ? __x64_sys_ioctl+0x91/0xd0
  ? do_syscall_64+0x60/0xd0
  ? entry_SYSCALL_64_after_hwframe+0x4b/0xb5
  </TASK>
rcu: INFO: rcu_sched detected expedited stalls on CPUs/tasks: { 13-.... } 8 jiffies s: 169 root: 0x2000/.
rcu: blocking rcu_node structures (internal RCU debug):
Task dump for CPU 13:
task:X               state:R  running task     stack:    0 pid: 4242 ppid:  4228 flags:0x0000400e
Call Trace:
  <TASK>
  ? memcpy_toio+0x76/0xc0
  ? drm_fb_memcpy_toio+0x76/0xb0
  ? drm_fb_blit_toio+0x75/0x2b0
  ? simpledrm_simple_display_pipe_update+0x132/0x150
  ? drm_atomic_helper_commit_planes+0xb6/0x230
  ? drm_atomic_helper_commit_tail+0x44/0x80
  ? commit_tail+0xd7/0x130
  ? drm_atomic_helper_commit+0x126/0x150
  ? drm_atomic_commit+0xa4/0xe0
  ? drm_plane_get_damage_clips.cold+0x1c/0x1c
  ? drm_atomic_helper_dirtyfb+0x19e/0x280
  ? drm_mode_dirtyfb_ioctl+0x10f/0x1e0
  ? drm_mode_getfb2_ioctl+0x2d0/0x2d0
  ? drm_ioctl_kernel+0xc4/0x150
  ? drm_ioctl+0x246/0x3f0
  ? drm_mode_getfb2_ioctl+0x2d0/0x2d0
  ? __x64_sys_ioctl+0x91/0xd0
  ? do_syscall_64+0x60/0xd0
  ? entry_SYSCALL_64_after_hwframe+0x4b/0xb5
  </TASK>
rcu: INFO: rcu_sched detected expedited stalls on CPUs/tasks: { 13-.... } 30 jiffies s: 169 root: 0x2000/.
rcu: blocking rcu_node structures (internal RCU debug):
Task dump for CPU 13:
task:X               state:R  running task     stack:    0 pid: 4242 ppid:  4228 flags:0x0000400e
Call Trace:
  <TASK>
  ? memcpy_toio+0x76/0xc0
  ? memcpy_toio+0x1b/0xc0
  ? drm_fb_memcpy_toio+0x76/0xb0
  ? drm_fb_blit_toio+0x75/0x2b0
  ? simpledrm_simple_display_pipe_update+0x132/0x150
  ? drm_atomic_helper_commit_planes+0xb6/0x230
  ? drm_atomic_helper_commit_tail+0x44/0x80
  ? commit_tail+0xd7/0x130
  ? drm_atomic_helper_commit+0x126/0x150
  ? drm_atomic_commit+0xa4/0xe0
  ? drm_plane_get_damage_clips.cold+0x1c/0x1c
  ? drm_atomic_helper_dirtyfb+0x19e/0x280
  ? drm_mode_dirtyfb_ioctl+0x10f/0x1e0
  ? drm_mode_getfb2_ioctl+0x2d0/0x2d0
  ? drm_ioctl_kernel+0xc4/0x150
  ? drm_ioctl+0x246/0x3f0
  ? drm_mode_getfb2_ioctl+0x2d0/0x2d0
  ? __x64_sys_ioctl+0x91/0xd0
  ? do_syscall_64+0x60/0xd0
  ? entry_SYSCALL_64_after_hwframe+0x4b/0xb5
  </TASK>
rcu: INFO: rcu_sched detected expedited stalls on CPUs/tasks: { 13-.... } 52 jiffies s: 169 root: 0x2000/.
rcu: blocking rcu_node structures (internal RCU debug):
Task dump for CPU 13:
task:X               state:R  running task     stack:    0 pid: 4242 ppid:  4228 flags:0x0000400e
Call Trace:
  <TASK>
  ? memcpy_toio+0x76/0xc0
  ? memcpy_toio+0x1b/0xc0
  ? drm_fb_memcpy_toio+0x76/0xb0
  ? drm_fb_blit_toio+0x75/0x2b0
  ? simpledrm_simple_display_pipe_update+0x132/0x150
  ? drm_atomic_helper_commit_planes+0xb6/0x230
  ? drm_atomic_helper_commit_tail+0x44/0x80
  ? commit_tail+0xd7/0x130
  ? drm_atomic_helper_commit+0x126/0x150
  ? drm_atomic_commit+0xa4/0xe0
  ? drm_plane_get_damage_clips.cold+0x1c/0x1c
  ? drm_atomic_helper_dirtyfb+0x19e/0x280
  ? drm_mode_dirtyfb_ioctl+0x10f/0x1e0
  ? drm_mode_getfb2_ioctl+0x2d0/0x2d0
  ? drm_ioctl_kernel+0xc4/0x150
  ? drm_ioctl+0x246/0x3f0
  ? drm_mode_getfb2_ioctl+0x2d0/0x2d0
  ? __x64_sys_ioctl+0x91/0xd0
  ? do_syscall_64+0x60/0xd0
  ? entry_SYSCALL_64_after_hwframe+0x4b/0xb5
  </TASK>
traps: avahi-ml[4447] general protection fault ip:7fdde6a37bc1 sp:7fdde07fc920 error:0 in module-zeroconf-publish.so[7fdde6a37000+3000]


See the ticket for more details.

BTW, let me use this mail to also add the report to the list of tracked
regressions to ensure it's doesn't fall through the cracks:

#regzbot introduced: cfecfc98a78d9
https://bugzilla.kernel.org/show_bug.cgi?id=216616
#regzbot ignore-activity

Ciao, Thorsten (wearing his 'the Linux kernel's regression tracker' hat)

P.S.: As the Linux kernel's regression tracker I deal with a lot of
reports and sometimes miss something important when writing mails like
this. If that's the case here, don't hesitate to tell me in a public
reply, it's in everyone's interest to set the public record straight.

--
Thomas Zimmermann
Graphics Driver Developer
SUSE Software Solutions Germany GmbH
Maxfeldstr. 5, 90409 Nürnberg, Germany
(HRB 36809, AG Nürnberg)
Geschäftsführer: Ivo Totev

Attachment: OpenPGP_signature
Description: OpenPGP digital signature


[Index of Archives]     [Linux DRI Users]     [Linux Intel Graphics]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [XFree86]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Linux Kernel]     [Linux SCSI]     [XFree86]
  Powered by Linux