On 4/4/23 17:29, Liu, Yi L wrote: >> From: Eric Auger <eric.auger@xxxxxxxxxx> >> Sent: Tuesday, April 4, 2023 11:19 PM >> >> Hi Yi, >> >> On 4/4/23 16:37, Liu, Yi L wrote: >>> Hi Eric, >>> >>>> From: Eric Auger <eric.auger@xxxxxxxxxx> >>>> Sent: Tuesday, April 4, 2023 10:00 PM >>>> >>>> Hi YI, >>>> >>>> On 4/1/23 16:44, Yi Liu wrote: >>>>> If the affected device is not opened by any user, it's safe to reset it >>>>> given it's not in use. >>>>> >>>>> Reviewed-by: Kevin Tian <kevin.tian@xxxxxxxxx> >>>>> Reviewed-by: Jason Gunthorpe <jgg@xxxxxxxxxx> >>>>> Tested-by: Yanting Jiang <yanting.jiang@xxxxxxxxx> >>>>> Signed-off-by: Yi Liu <yi.l.liu@xxxxxxxxx> >>>>> --- >>>>> drivers/vfio/pci/vfio_pci_core.c | 14 +++++++++++--- >>>>> include/uapi/linux/vfio.h | 8 ++++++++ >>>>> 2 files changed, 19 insertions(+), 3 deletions(-) >>>>> >>>>> diff --git a/drivers/vfio/pci/vfio_pci_core.c b/drivers/vfio/pci/vfio_pci_core.c >>>>> index 65bbef562268..5d745c9abf05 100644 >>>>> --- a/drivers/vfio/pci/vfio_pci_core.c >>>>> +++ b/drivers/vfio/pci/vfio_pci_core.c >>>>> @@ -2429,10 +2429,18 @@ static int vfio_pci_dev_set_hot_reset(struct >>>> vfio_device_set *dev_set, >>>>> list_for_each_entry(cur_vma, &dev_set->device_list, vdev.dev_set_list) { >>>>> /* >>>>> - * Test whether all the affected devices are contained by the >>>>> - * set of groups provided by the user. >>>>> + * Test whether all the affected devices can be reset by the >>>>> + * user. >>>>> + * >>>>> + * Resetting an unused device (not opened) is safe, because >>>>> + * dev_set->lock is held in hot reset path so this device >>>>> + * cannot race being opened by another user simultaneously. >>>>> + * >>>>> + * Otherwise all opened devices in the dev_set must be >>>>> + * contained by the set of groups provided by the user. >>>>> */ >>>>> - if (!vfio_dev_in_groups(cur_vma, groups)) { >>>>> + if (cur_vma->vdev.open_count && >>>>> + !vfio_dev_in_groups(cur_vma, groups)) { >>>>> ret = -EINVAL; >>>>> goto err_undo; >>>>> } >>>>> diff --git a/include/uapi/linux/vfio.h b/include/uapi/linux/vfio.h >>>>> index 0552e8dcf0cb..f96e5689cffc 100644 >>>>> --- a/include/uapi/linux/vfio.h >>>>> +++ b/include/uapi/linux/vfio.h >>>>> @@ -673,6 +673,14 @@ struct vfio_pci_hot_reset_info { >>>>> * VFIO_DEVICE_PCI_HOT_RESET - _IOW(VFIO_TYPE, VFIO_BASE + 13, >>>>> * struct vfio_pci_hot_reset) >>>>> * >>>>> + * Userspace requests hot reset for the devices it uses. Due to the >>>>> + * underlying topology, multiple devices can be affected in the reset >>>> by the reset >>>>> + * while some might be opened by another user. To avoid interference >>>> s/interference/hot reset failure? >>> I don’t think user can really avoid hot reset failure since there may >>> be new devices plugged into the affected slot. Even user has opened >> I don't know the legacy wrt that issue but this sounds a serious issue, >> meaning the reset of an assigned device could impact another device >> belonging to another group not not owned by the user? > but the hot reset shall fail as the group is not owned by the user. sure it shall but I fail to understand if the reset fails or the device plug is somehow delayed until the reset completes. > >>> all the groups/devices reported by VFIO_DEVICE_GET_PCI_HOT_RESET_INFO, >>> the hot reset can fail if new device is plugged in and has not been >>> bound to vfio or opened by another user during the window of >>> _INFO and HOT_RESET. >> with respect to the latter isn't the dev_set lock held during the hot >> reset and sufficient to prevent any new opening to occur? > yes. new open needs to acquire the dev_set lock. So when hot reset > acquires the dev_set lock, then no new open can occur. > > Regards, > Yi Liu > >>> maybe the whole statement should be as below: >>> >>> To avoid interference, the hot reset can only be conducted when all >>> the affected devices are either opened by the calling user or not >>> opened yet at the moment of the hot reset attempt. >> OK >> >> Eric >>>>> + * the calling user must ensure all affected devices, if opened, are >>>>> + * owned by itself. >>>>> + * >>>>> + * The ownership is proved by an array of group fds. >>>>> + * >>>>> * Return: 0 on success, -errno on failure. >>>>> */ >>>>> struct vfio_pci_hot_reset { >>> Regards, >>> Yi Liu