On Wed, Aug 29, 2018 at 4:43 PM, Andrey Grodzovsky <Andrey.Grodzovsky at amd.com> wrote: > Just another ping... > > Daniel, Dave - maybe you could give some advise on that ? > > P.S I tried with Intel card (i915) driver on 4.18.1 kernel to do the same to > get some reference point, but it just hanged. drm_device hot-unplug is defacto unsolved. We've only just started to fix the most obvious races around the refcounting of drm_device it'self, see the work from Noralf Tronnes around drm_dev_get/put. No one has even started to think about what it would take to correctly refcount a full-blown memory manager to handle hotunplug. I'd expect lots of nightmares. The real horror is that it's not just the drm_device, but also lots of things we're exporting: dma_buf, dma_fence, ... All of that must be handled one way or the other. So expect your kernel to Oops when you unplug a device. Wrt userspace handling this: Probably an even bigger question. No idea, and will depend upon what userspace you're running. -Daniel > > Andrey > > > > > On 08/27/2018 12:04 PM, Andrey Grodzovsky wrote: >> >> Hi everybody , I am trying to resolve various problems I observe when >> logically removing AMDGPU device from pci - echo 1 > >> /sys/class/drm/card0/device/remove >> >> One of the problems I encountered was hitting WARNs in >> amdgpu_gem_force_release. It complaints about still open client FDs and BOs >> allocations which is obvious since >> >> we didn't let user space clients know about the device removal and hence >> they won't release allocations and won't close their FDs. >> >> Question - how other drivers handle this use case, especially eGPUs since >> they indeed may be extracted in any moment, is there any way to notify Xorg >> and other clients about this so they may >> >> have a chance to release all their allocations and probably terminate ? >> Maybe some kind of uevent ? >> >> Andrey >> > -- Daniel Vetter Software Engineer, Intel Corporation +41 (0) 79 365 57 48 - http://blog.ffwll.ch