Re: [PATCH 0/3] Allow the group FD to remain open when unplugging a device

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 10/6/22 6:42 PM, Jason Gunthorpe wrote:
> On Thu, Oct 06, 2022 at 01:53:15PM -0600, Alex Williamson wrote:
>> On Thu,  6 Oct 2022 09:40:35 -0300
>> Jason Gunthorpe <jgg@xxxxxxxxxx> wrote:
>>
>>> Testing has shown that virtnodedevd will leave the group FD open for long
>>> periods, even after all the cdevs have been destroyed. This blocks
>>> destruction of the VFIO device and is undesirable.
>>>
>>> That approach was selected to accomodate SPAPR which has an broken
>>> lifecyle model for the iommu_group. However, we can accomodate SPAPR by
>>> realizing that it doesn't use the iommu core at all, so rules about
>>> iommu_group lifetime do not apply to it.
>>>
>>> Giving the KVM code its own kref on the iommu_group allows the VFIO core
>>> code to release its iommu_group reference earlier and we can remove the
>>> sleep that only existed for SPAPR.
>>>
>>> Jason Gunthorpe (3):
>>>   vfio: Add vfio_file_is_group()
>>>   vfio: Hold a reference to the iommu_group in kvm for SPAPR
>>>   vfio: Make the group FD disassociate from the iommu_group
>>>
>>>  drivers/vfio/pci/vfio_pci_core.c |  2 +-
>>>  drivers/vfio/vfio.h              |  1 -
>>>  drivers/vfio/vfio_main.c         | 90 +++++++++++++++++++++-----------
>>>  include/linux/vfio.h             |  1 +
>>>  virt/kvm/vfio.c                  | 45 +++++++++++-----
>>>  5 files changed, 94 insertions(+), 45 deletions(-)
>>
>> Containers aren't getting cleaned up with this series, starting and
>> shutting down a libvirt managed VM with vfio-pci devices, an mtty mdev
>> device, and making use of hugepages, /proc/meminfo shows the hugepages
>> are not released on VM shutdown and I'm unable to subsequently restart
>> the VM. Thanks,
> 
> Oh, I'm surprised the s390 testing didn't hit this!!

Huh, me too, at least eventually - I think it's because we aren't pinning everything upfront but rather on-demand so the missing the type1 release / vfio_iommu_unmap_unpin_all wouldn't be so obvious.  I definitely did multiple VM (re)starts and hot (un)plugs.  But while my test workloads did some I/O, the long-running one was focused on the plug/unplug scenarios to recreate the initial issue so the I/O (and thus pinning) done would have been minimal.

-ap and -ccw also don't pin everything upfront (and I did far less testing with those).

Ugh.  Moving forward, might be worth seeing how I can loop in some non-s390-specific vfio testing into my routine.

> 
> This hunk should remain since not all cases are closures due to device
> hot unplug
> 
> diff --git a/drivers/vfio/vfio_main.c b/drivers/vfio/vfio_main.c
> index f9cb734d3991b3..62aba3a128fb8d 100644
> --- a/drivers/vfio/vfio_main.c
> +++ b/drivers/vfio/vfio_main.c
> @@ -954,6 +954,13 @@ static int vfio_group_fops_release(struct inode *inode, struct file *filep)
>  	filep->private_data = NULL;
>  
>  	mutex_lock(&group->group_lock);
> +	/*
> +	 * Device FDs hold a group file reference, therefore the group release
> +	 * is only called when there are no open devices.
> +	 */
> +	WARN_ON(group->notifier.head);
> +	if (group->container)
> +		vfio_group_detach_container(group);
>  	group->opened_file = NULL;
>  	mutex_unlock(&group->group_lock);
>  	return 0;

Anyway, FWIW, I folded this in and re-ran a brief series of -pci, -ccw and -ap tests on s390 and things still look good.  For completeness I'll start some longer-running pci tests next but I expect this will still be fine as well.



[Index of Archives]     [KVM ARM]     [KVM ia64]     [KVM ppc]     [Virtualization Tools]     [Spice Development]     [Libvirt]     [Libvirt Users]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite Questions]     [Linux Kernel]     [Linux SCSI]     [XFree86]

  Powered by Linux