Hi Andreas Am 24.10.22 um 18:19 schrieb Andreas Thalhammer:
Am 24.10.22 um 13:31 schrieb Thomas Zimmermann:Hi Am 24.10.22 um 13:27 schrieb Greg KH:On Mon, Oct 24, 2022 at 12:41:43PM +0200, Thorsten Leemhuis wrote:Hi! Thx for the reply. On 24.10.22 12:26, Thomas Zimmermann wrote:Am 23.10.22 um 10:04 schrieb Thorsten Leemhuis:I noticed a regression report in bugzilla.kernel.org. As many (most?) kernel developer don't keep an eye on it, I decided to forward it by mail. Quoting from https://bugzilla.kernel.org/show_bug.cgi?id=216616 ; :Andreas 2022-10-22 14:25:32 UTC Created attachment 303074 [details] dmesgI've looked at the kernel log and found that simpledrm has been loaded *after* amdgpu, which should never happen. The problematic patch hasbeen taken from a long list of refactoring work on this code. No wonderthat it doesn't work as expected. Please cherry-pick commit 9d69ef183815 ("fbdev/core: Remove remove_conflicting_pci_framebuffers()") into the 6.0 stable branch and report on the results. It should fix the problem.Greg, is that enough for you to pick this up? Or do you want Andreas to test first if it really fixes the reported problem?This should be good enough. If this does NOT fix the issue, please let me know.Thanks a lot. I think I can provided a dedicated fix if the proposed commit doesn't work. Best regards Thomasthanks, greg k-hThanks... In short: the additional patch did NOT fix the problem.
Yeah, it's also part of a larger changeset. But I wouldn't want to backport all those changes either.
Attached is a simple patch for linux-stable that adds the necessary fix. If this still doesn't work, we should probably revert the problematic patch.
Please test the patch and let me know if it works. Best regards Thomas
I don't use git and I don't know how to /cherry-pick commit/ 9d69ef183815, but I found the patch here: https://patchwork.freedesktop.org/patch/494609/ I hope that's the right one. I reintegrated v2-07-11-video-aperture-Disable-and-unregister-sysfb-devices-via-aperture-helpers.patch and also applied v2-04-11-fbdev-core-Remove-remove_conflicting_pci_framebuffers.patch, did a "make mrproper" and thereafter compiled a clean new 6.0.3 kernel (same .config). Now the system doesn't even boot to a console. The first boot got me to a rcu_shed stall on CPUs/tasks, same as above, but this time with: Workqueue: btrfs-cache btrfs_work_helper I booted a second time with the same kernel, and it got stuck after mounting the root btrfs filesystem (what looked like a total freeze, but when it didn't show a rcu_stall message after ~2 min I got impatient and wanted to see if I had just busted my root filesystem...) I booted 6.0.2 and everything is fine. (I'm very glad! I definitely should update my backup right away!) I will try 6.1-rc1 next, bear with...
-- Thomas Zimmermann Graphics Driver Developer SUSE Software Solutions Germany GmbH Maxfeldstr. 5, 90409 Nürnberg, Germany (HRB 36809, AG Nürnberg) Geschäftsführer: Ivo Totev
From ba55e238e64817a2369a267153a5b980683465a1 Mon Sep 17 00:00:00 2001 From: Thomas Zimmermann <tzimmermann@xxxxxxx> Date: Tue, 25 Oct 2022 09:38:44 +0200 Subject: [PATCH] video/aperture: Call sysfb_disable() before removing PCI devices Call sysfb_disable() from aperture_remove_conflicting_pci_devices() before removing PCI devices. Without, simpledrm can still bind to simple-framebuffer devices after the hardware driver has taken over the hardware. Both drivers interfere with each other and results are undefined. Reported modesetting errors are shown below. ---- snap ---- rcu: INFO: rcu_sched detected expedited stalls on CPUs/tasks: { 13-.... } 7 jiffies s: 165 root: 0x2000/. rcu: blocking rcu_node structures (internal RCU debug): Task dump for CPU 13: task:X state:R running task stack: 0 pid: 4242 ppid: 4228 flags:0x00000008 Call Trace: <TASK> ? commit_tail+0xd7/0x130 ? drm_atomic_helper_commit+0x126/0x150 ? drm_atomic_commit+0xa4/0xe0 ? drm_plane_get_damage_clips.cold+0x1c/0x1c ? drm_atomic_helper_dirtyfb+0x19e/0x280 ? drm_mode_dirtyfb_ioctl+0x10f/0x1e0 ? drm_mode_getfb2_ioctl+0x2d0/0x2d0 ? drm_ioctl_kernel+0xc4/0x150 ? drm_ioctl+0x246/0x3f0 ? drm_mode_getfb2_ioctl+0x2d0/0x2d0 ? __x64_sys_ioctl+0x91/0xd0 ? do_syscall_64+0x60/0xd0 ? entry_SYSCALL_64_after_hwframe+0x4b/0xb5 </TASK> ... rcu: INFO: rcu_sched detected expedited stalls on CPUs/tasks: { 13-.... } 30 jiffies s: 169 root: 0x2000/. rcu: blocking rcu_node structures (internal RCU debug): Task dump for CPU 13: task:X state:R running task stack: 0 pid: 4242 ppid: 4228 flags:0x0000400e Call Trace: <TASK> ? memcpy_toio+0x76/0xc0 ? memcpy_toio+0x1b/0xc0 ? drm_fb_memcpy_toio+0x76/0xb0 ? drm_fb_blit_toio+0x75/0x2b0 ? simpledrm_simple_display_pipe_update+0x132/0x150 ? drm_atomic_helper_commit_planes+0xb6/0x230 ? drm_atomic_helper_commit_tail+0x44/0x80 ? commit_tail+0xd7/0x130 ? drm_atomic_helper_commit+0x126/0x150 ? drm_atomic_commit+0xa4/0xe0 ? drm_plane_get_damage_clips.cold+0x1c/0x1c ? drm_atomic_helper_dirtyfb+0x19e/0x280 ? drm_mode_dirtyfb_ioctl+0x10f/0x1e0 ? drm_mode_getfb2_ioctl+0x2d0/0x2d0 ? drm_ioctl_kernel+0xc4/0x150 ? drm_ioctl+0x246/0x3f0 ? drm_mode_getfb2_ioctl+0x2d0/0x2d0 ? __x64_sys_ioctl+0x91/0xd0 ? do_syscall_64+0x60/0xd0 ? entry_SYSCALL_64_after_hwframe+0x4b/0xb5 </TASK> The problem was introduced by backporting commit 5e0137612430 ("video/aperture: Disable and unregister sysfb devices via aperture helpers") to v6.0.3 and does not exist in the mainline branch. Reported-by: Andreas Thalhammer <andreas.thalhammer-linux@xxxxxxx> Reported-by: Thorsten Leemhuis <regressions@xxxxxxxxxxxxx> Signed-off-by: Thomas Zimmermann <tzimmermann@xxxxxxx> Fixes: cfecfc98a78d ("video/aperture: Disable and unregister sysfb devices via aperture helpers") Cc: Thomas Zimmermann <tzimmermann@xxxxxxx> Cc: Javier Martinez Canillas <javierm@xxxxxxxxxx> Cc: Zack Rusin <zackr@xxxxxxxxxx> Cc: Daniel Vetter <daniel.vetter@xxxxxxxx> Cc: Daniel Vetter <daniel@xxxxxxxx> Cc: Sam Ravnborg <sam@xxxxxxxxxxxx> Cc: Helge Deller <deller@xxxxxx> Cc: Alex Deucher <alexander.deucher@xxxxxxx> Cc: Zhen Lei <thunder.leizhen@xxxxxxxxxx> Cc: Changcheng Deng <deng.changcheng@xxxxxxxxxx> Cc: Maarten Lankhorst <maarten.lankhorst@xxxxxxxxxxxxxxx> Cc: Maxime Ripard <mripard@xxxxxxxxxx> Cc: dri-devel@xxxxxxxxxxxxxxxxxxxxx Cc: Sasha Levin <sashal@xxxxxxxxxx> Cc: linux-fbdev@xxxxxxxxxxxxxxx Cc: <stable@xxxxxxxxxxxxxxx> # v6.0.3+ Link: https://lore.kernel.org/dri-devel/d6afe54b-f8d7-beb2-3609-186e566cbfac@xxxxxxx/T/#t --- drivers/video/aperture.c | 11 +++++++++++ 1 file changed, 11 insertions(+) diff --git a/drivers/video/aperture.c b/drivers/video/aperture.c index d245826a9324d..cc6427a091bc7 100644 --- a/drivers/video/aperture.c +++ b/drivers/video/aperture.c @@ -338,6 +338,17 @@ int aperture_remove_conflicting_pci_devices(struct pci_dev *pdev, const char *na resource_size_t base, size; int bar, ret; + /* + * If a driver asked to unregister a platform device registered by + * sysfb, then can be assumed that this is a driver for a display + * that is set up by the system firmware and has a generic driver. + * + * Drivers for devices that don't have a generic driver will never + * ask for this, so let's assume that a real driver for the display + * was already probed and prevent sysfb to register devices later. + */ + sysfb_disable(); + /* * WARNING: Apparently we must kick fbdev drivers before vgacon, * otherwise the vga fbdev driver falls over. -- 2.38.0
Attachment:
OpenPGP_signature
Description: OpenPGP digital signature