On Tue, Feb 06, 2024 at 07:42:49PM +0100, Christian König wrote: > Am 06.02.24 um 15:29 schrieb Daniel Vetter: > > On Fri, Feb 02, 2024 at 03:40:03PM -0800, Greg Kroah-Hartman wrote: > > > On Fri, Feb 02, 2024 at 05:25:56PM -0500, Hamza Mahfooz wrote: > > > > Removing an amdgpu device that still has user space references allocated > > > > to it causes undefined behaviour. > > > Then fix that please. There should not be anything special about your > > > hardware that all of the tens of thousands of other devices can't handle > > > today. > > > > > > What happens when I yank your device out of a system with a pci hotplug > > > bus? You can't prevent that either, so this should not be any different > > > at all. > > > > > > sorry, but please, just fix your driver. > > fwiw Christian König from amd already rejected this too, I have no idea > > why this was submitted > > Well that was my fault. > > I commented on an internal bug tracker that when sysfs bind/undbind is a > different code path from PCI remove/re-scan we could try to reject it. > > Turned out it isn't a different code path. Yeah it's exactly the same code, and removing the sysfs stuff means we cant test hotunplug without physical hotunplugging stuff anymore. So really not great - if one is buggy so is the other, and sysfs allows us to control the timing a lot better to hit specific issues. -Sima > > since the very elaborate plan I developed with a > > bunch of amd folks was to fix the various lifetime lolz we still have in > > drm. We unfortunately export the world of internal objects to userspace as > > uabi objects with dma_buf, dma_fence and everything else, but it's all > > fixable and we have the plan even documented: > > > > https://dri.freedesktop.org/docs/drm/gpu/drm-uapi.html#device-hot-unplug > > > > So yeah anything that isn't that plan of record is very much no-go for drm > > drivers. Unless we change that plan of course, but that needs a > > documentation patch first and a big discussion. > > > > Aside from an absolute massive pile of kernel-internal refcounting bugs > > the really big one we agreed on after a lot of discussion is that SIGBUS > > on dma-buf mmaps is no-go for drm drivers, because it would break way too > > much userspace in ways which are simply not fixable (since sig handlers > > are shared in a process, which means the gl/vk driver cannot use it). > > > > Otherwise it's bog standard "fix the kernel bugs" work, just a lot of it. > > Ignoring a few memory leaks because of messed up refcounting we actually got > that working quite nicely. > > At least hot unplug / hot add seems to be working rather reliable in our > internal testing. > > So it can't be that messed up. > > Regards, > Christian. > > > > > Cheers, Sima > -- Daniel Vetter Software Engineer, Intel Corporation http://blog.ffwll.ch