> On Apr 8, 2022, at 11:28 PM, Andrey Grodzovsky <andrey.grodzovsky@xxxxxxx> wrote: > > [Some people who received this message don't often get email from andrey.grodzovsky@xxxxxxx. Learn why this is important at http://aka.ms/LearnAboutSenderIdentification.] > > On 2022-04-08 04:45, Shuotao Xu wrote: >> Adding PCIe Hotplug Support for AMDKFD: the support of hot-plug of GPU >> devices can open doors for many advanced applications in data center >> in the next few years, such as for GPU resource >> disaggregation. Current AMDKFD does not support hotplug out b/o the >> following reasons: >> >> 1. During PCIe removal, decrement KFD lock which was incremented at >> the beginning of hw fini; otherwise kfd_open later is going to >> fail. > > I assumed you read my comment last time, still you do same approach. > More in details bellow Aha, I like your fix:) I was not familiar with drm APIs so just only half understood your comment last time. BTW, I tried hot-plugging out a GPU when rocm application is still running.