On 10/6/22 18:31, Alex Williamson wrote: > On Thu, 6 Oct 2022 08:37:09 -0300 > Jason Gunthorpe <jgg@xxxxxxxxxx> wrote: > >> On Wed, Oct 05, 2022 at 04:03:56PM -0600, Alex Williamson wrote: >>> We can't have a .remove callback that does nothing, this breaks >>> removing the device while it's in use. Once we have the >>> vfio_unregister_group_dev() fix below, we'll block until the device is >>> unused, at which point vgpu->attached becomes false. Unless I'm >>> missing something, I think we should also follow-up with a patch to >>> remove that bogus warn-on branch, right? Thanks, >> >> Yes, looks right to me. >> >> I question all the logical arround attached, where is the locking? > > Zhenyu, Zhi, Kevin, > > Could someone please take a look at use of vgpu->attached in the GVT-g > driver? It's use in intel_vgpu_remove() is bogus, the .release > callback needs to use vfio_unregister_group_dev() to wait for the > device to be unused. The WARN_ON/return here breaks all future use of > the device. I assume @attached has something to do with the page table > interface with KVM, but it all looks racy anyway. > Thanks for pointing this out. It was introduced in the GVT-g refactor patch series and Christoph might not want to touch the vgpu->released while he needed a new state. I dig it a bit. vgpu->attached would be used for preventing multiple open on a single vGPU and indicate the kvm_get_kvm() has been done. vgpu->released was to prevent the release before close, which is now handled by the vfio_device_*. What I would like to do are: 1) Remove the vgpu->released. 2) Use alock to protect vgpu->attached. After those were solved, the WARN_ON/return in the intel_vgpu_remove() should be safely removed as the .release will be called after .close_device deceases the vfio_device->refcnt to zero. Thanks, Zhi. > Also, whatever purpose vgpu->released served looks unnecessary now. > Thanks, > > Alex >