On Fri, 2022-06-17 at 14:41 +0200, Thomas Zimmermann wrote: > Hi > > Am 17.06.22 um 14:29 schrieb Javier Martinez Canillas: > > [adding Zack and Alex to Cc list] > > > > Hello Thomas, > > > > Thanks a lot for tracking this down and figuring out the root cause! > > > > On 6/17/22 14:10, Thomas Zimmermann wrote: > > > Always run fbdev removal first to remove simpledrm via > > > sysfb_disable(). This clears the internal state. The later call > > > to drm_aperture_detach_drivers() then does nothing. Otherwise, > > > with drm_aperture_detach_drivers() running first, the call to > > > sysfb_disable() uses inconsistent state. > > > > > > Example backtrace show below: > > > > > > [ 11.663422] ================================================================== > > > [ 11.663426] BUG: KASAN: use-after-free in device_del+0x79/0x5f0 > > > [ 11.663435] Read of size 8 at addr ffff888108185050 by task systemd-udevd/311 > > > [ 11.663440] CPU: 0 PID: 311 Comm: systemd-udevd Tainted: G E 5 > > > .19.0-rc2-1-default+ #1689 > > > [ 11.663445] Hardware name: HP ProLiant DL120 G7, BIOS J01 04/21/2011 > > > [ 11.663447] Call Trace: > > > [ 11.663449] <TASK> > > > [ 11.663451] ? device_del+0x79/0x5f0 > > > [ 11.663456] dump_stack_lvl+0x5b/0x73 > > > [ 11.663462] print_address_description.constprop.0+0x1f/0x1b0 > > > [ 11.663468] ? device_del+0x79/0x5f0 > > > [ 11.663471] ? device_del+0x79/0x5f0 > > > [ 11.663475] print_report.cold+0x3c/0x21c > > > [ 11.663481] ? lock_acquired+0x87/0x1e0 > > > [ 11.663484] ? lock_acquired+0x87/0x1e0 > > > [ 11.663489] ? device_del+0x79/0x5f0 > > > [ 11.663492] kasan_report+0xbf/0xf0 > > > [ 11.663498] ? device_del+0x79/0x5f0 > > > [ 11.663503] device_del+0x79/0x5f0 > > > [ 11.663509] ? device_remove_attrs+0x170/0x170 > > > [ 11.663514] ? lock_is_held_type+0xe8/0x140 > > > [ 11.663523] platform_device_del.part.0+0x19/0xe0 > > > [ 11.663530] platform_device_unregister+0x1c/0x30 > > > [ 11.663535] sysfb_disable+0x2d/0x70 > > > [ 11.663540] remove_conflicting_framebuffers+0x1c/0xf0 > > > [ 11.663546] remove_conflicting_pci_framebuffers+0x130/0x1a0 > > > [ 11.663554] drm_aperture_remove_conflicting_pci_framebuffers+0x86/0xb0 > > > [ 11.663561] ? mgag200_pci_remove+0x30/0x30 [mgag200] > > > [ 11.663578] mgag200_pci_probe+0x2d/0x140 [mgag200] > > > > > > > Maybe include a Reported-by: Zack Rusin <zackr@xxxxxxxxxx> ? since > > this seems to be the exact same issue that he reported yesterday. > > I'll do. > > > > > Patch looks good to me and this seems to be the correct fix IMO. > > That way we won't re-introduce the issue that was fixed by the > > sysfb_unregister() function, that is to remove a pdev even if wasn't > > bound to a driver to prevent a late simpledrm registration to match. > > > > Reviewed-by: Javier Martinez Canillas <javierm@xxxxxxxxxx> > > Thanks! > > > > > I wonder what's the best way to coordinate with Alex to merge this > > fix and your patch that moves the aperture code out of DRM, since > > as far as I can tell both should target the v5.20 release. > > If nothing else comes in, I'll merge this patch on Monday and send Alex > an updated version of the vfio patch. Thanks Thomas. I just tested it on our arm64 configs and it fixes the crash on boot we've been seeing and gets them running again. Reviewed-by: Zack Rusin <zackr@xxxxxxxxxx> z