On Wed, Aug 29, 2018 at 8:28 PM, Andrey Grodzovsky <Andrey.Grodzovsky at amd.com> wrote: > Actually, I've just spotted this drm_dev_unplug, does it make sense to use > it in our pci_driver.remove hook > > instead of explicitly doing drm_dev_unregister and drm_dev_put(dev) ? > > This way at least any following IOCTL will fail with ENODEV. Definitely. The problem is still that the refcounting beyond the drm_device is totally screwed up, and your kernel will Oops eventually. -Daniel > > Andrey > > > On 08/29/2018 11:07 AM, Daniel Vetter wrote: >> >> On Wed, Aug 29, 2018 at 4:43 PM, Andrey Grodzovsky >> <Andrey.Grodzovsky at amd.com> wrote: >>> >>> Just another ping... >>> >>> Daniel, Dave - maybe you could give some advise on that ? >>> >>> P.S I tried with Intel card (i915) driver on 4.18.1 kernel to do the same >>> to >>> get some reference point, but it just hanged. >> >> drm_device hot-unplug is defacto unsolved. We've only just started to >> fix the most obvious races around the refcounting of drm_device >> it'self, see the work from Noralf Tronnes around drm_dev_get/put. >> >> No one has even started to think about what it would take to correctly >> refcount a full-blown memory manager to handle hotunplug. I'd expect >> lots of nightmares. The real horror is that it's not just the >> drm_device, but also lots of things we're exporting: dma_buf, >> dma_fence, ... All of that must be handled one way or the other. >> >> So expect your kernel to Oops when you unplug a device. >> >> Wrt userspace handling this: Probably an even bigger question. No >> idea, and will depend upon what userspace you're running. >> -Daniel >> >>> Andrey >>> >>> >>> >>> >>> On 08/27/2018 12:04 PM, Andrey Grodzovsky wrote: >>>> >>>> Hi everybody , I am trying to resolve various problems I observe when >>>> logically removing AMDGPU device from pci - echo 1 > >>>> /sys/class/drm/card0/device/remove >>>> >>>> One of the problems I encountered was hitting WARNs in >>>> amdgpu_gem_force_release. It complaints about still open client FDs and >>>> BOs >>>> allocations which is obvious since >>>> >>>> we didn't let user space clients know about the device removal and hence >>>> they won't release allocations and won't close their FDs. >>>> >>>> Question - how other drivers handle this use case, especially eGPUs >>>> since >>>> they indeed may be extracted in any moment, is there any way to notify >>>> Xorg >>>> and other clients about this so they may >>>> >>>> have a chance to release all their allocations and probably terminate ? >>>> Maybe some kind of uevent ? >>>> >>>> Andrey >>>> >> >> > -- Daniel Vetter Software Engineer, Intel Corporation +41 (0) 79 365 57 48 - http://blog.ffwll.ch