Re: [PATCH 0/3] Allow the group FD to remain open when unplugging a device

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 10/6/22 7:28 PM, Matthew Rosato wrote:
> On 10/6/22 6:42 PM, Jason Gunthorpe wrote:
>> On Thu, Oct 06, 2022 at 01:53:15PM -0600, Alex Williamson wrote:
>>> On Thu,  6 Oct 2022 09:40:35 -0300
>>> Jason Gunthorpe <jgg@xxxxxxxxxx> wrote:
>>>
>>>> Testing has shown that virtnodedevd will leave the group FD open for long
>>>> periods, even after all the cdevs have been destroyed. This blocks
>>>> destruction of the VFIO device and is undesirable.
>>>>
>>>> That approach was selected to accomodate SPAPR which has an broken
>>>> lifecyle model for the iommu_group. However, we can accomodate SPAPR by
>>>> realizing that it doesn't use the iommu core at all, so rules about
>>>> iommu_group lifetime do not apply to it.
>>>>
>>>> Giving the KVM code its own kref on the iommu_group allows the VFIO core
>>>> code to release its iommu_group reference earlier and we can remove the
>>>> sleep that only existed for SPAPR.
>>>>
>>>> Jason Gunthorpe (3):
>>>>   vfio: Add vfio_file_is_group()
>>>>   vfio: Hold a reference to the iommu_group in kvm for SPAPR
>>>>   vfio: Make the group FD disassociate from the iommu_group
>>>>
>>>>  drivers/vfio/pci/vfio_pci_core.c |  2 +-
>>>>  drivers/vfio/vfio.h              |  1 -
>>>>  drivers/vfio/vfio_main.c         | 90 +++++++++++++++++++++-----------
>>>>  include/linux/vfio.h             |  1 +
>>>>  virt/kvm/vfio.c                  | 45 +++++++++++-----
>>>>  5 files changed, 94 insertions(+), 45 deletions(-)
>>>
>>> Containers aren't getting cleaned up with this series, starting and
>>> shutting down a libvirt managed VM with vfio-pci devices, an mtty mdev
>>> device, and making use of hugepages, /proc/meminfo shows the hugepages
>>> are not released on VM shutdown and I'm unable to subsequently restart
>>> the VM. Thanks,
>>
>> Oh, I'm surprised the s390 testing didn't hit this!!
> 
> Huh, me too, at least eventually - I think it's because we aren't pinning everything upfront but rather on-demand so the missing the type1 release / vfio_iommu_unmap_unpin_all wouldn't be so obvious.  I definitely did multiple VM (re)starts and hot (un)plugs.  But while my test workloads did some I/O, the long-running one was focused on the plug/unplug scenarios to recreate the initial issue so the I/O (and thus pinning) done would have been minimal.
> 
> -ap and -ccw also don't pin everything upfront (and I did far less testing with those).
> 
> Ugh.  Moving forward, might be worth seeing how I can loop in some non-s390-specific vfio testing into my routine.
> 
>>
>> This hunk should remain since not all cases are closures due to device
>> hot unplug
>>
>> diff --git a/drivers/vfio/vfio_main.c b/drivers/vfio/vfio_main.c
>> index f9cb734d3991b3..62aba3a128fb8d 100644
>> --- a/drivers/vfio/vfio_main.c
>> +++ b/drivers/vfio/vfio_main.c
>> @@ -954,6 +954,13 @@ static int vfio_group_fops_release(struct inode *inode, struct file *filep)
>>  	filep->private_data = NULL;
>>  
>>  	mutex_lock(&group->group_lock);
>> +	/*
>> +	 * Device FDs hold a group file reference, therefore the group release
>> +	 * is only called when there are no open devices.
>> +	 */
>> +	WARN_ON(group->notifier.head);
>> +	if (group->container)
>> +		vfio_group_detach_container(group);
>>  	group->opened_file = NULL;
>>  	mutex_unlock(&group->group_lock);
>>  	return 0;
> 
> Anyway, FWIW, I folded this in and re-ran a brief series of -pci, -ccw and -ap tests on s390 and things still look good.  For completeness I'll start some longer-running pci tests next but I expect this will still be fine as well.

Further s390 vfio-pci testing still looks good with this change too.



[Index of Archives]     [KVM ARM]     [KVM ia64]     [KVM ppc]     [Virtualization Tools]     [Spice Development]     [Libvirt]     [Libvirt Users]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite Questions]     [Linux Kernel]     [Linux SCSI]     [XFree86]

  Powered by Linux