Actually, I've just spotted this drm_dev_unplug, does it make sense to use it in our pci_driver.remove hook instead of explicitly doing drm_dev_unregister and drm_dev_put(dev) ? This way at least any following IOCTL will fail with ENODEV. Andrey On 08/29/2018 11:07 AM, Daniel Vetter wrote: > On Wed, Aug 29, 2018 at 4:43 PM, Andrey Grodzovsky > <Andrey.Grodzovsky at amd.com> wrote: >> Just another ping... >> >> Daniel, Dave - maybe you could give some advise on that ? >> >> P.S I tried with Intel card (i915) driver on 4.18.1 kernel to do the same to >> get some reference point, but it just hanged. > drm_device hot-unplug is defacto unsolved. We've only just started to > fix the most obvious races around the refcounting of drm_device > it'self, see the work from Noralf Tronnes around drm_dev_get/put. > > No one has even started to think about what it would take to correctly > refcount a full-blown memory manager to handle hotunplug. I'd expect > lots of nightmares. The real horror is that it's not just the > drm_device, but also lots of things we're exporting: dma_buf, > dma_fence, ... All of that must be handled one way or the other. > > So expect your kernel to Oops when you unplug a device. > > Wrt userspace handling this: Probably an even bigger question. No > idea, and will depend upon what userspace you're running. > -Daniel > >> Andrey >> >> >> >> >> On 08/27/2018 12:04 PM, Andrey Grodzovsky wrote: >>> Hi everybody , I am trying to resolve various problems I observe when >>> logically removing AMDGPU device from pci - echo 1 > >>> /sys/class/drm/card0/device/remove >>> >>> One of the problems I encountered was hitting WARNs in >>> amdgpu_gem_force_release. It complaints about still open client FDs and BOs >>> allocations which is obvious since >>> >>> we didn't let user space clients know about the device removal and hence >>> they won't release allocations and won't close their FDs. >>> >>> Question - how other drivers handle this use case, especially eGPUs since >>> they indeed may be extracted in any moment, is there any way to notify Xorg >>> and other clients about this so they may >>> >>> have a chance to release all their allocations and probably terminate ? >>> Maybe some kind of uevent ? >>> >>> Andrey >>> > >