Sorry for late. after reading all your comments, I think I will try the solution 1. On 12/13/2016 03:12 AM, Alex Williamson wrote: > On Mon, 12 Dec 2016 21:49:01 +0800 > Cao jin <caoj.fnst@xxxxxxxxxxxxxx> wrote: > >> Hi, >> I have 2 solutions(high level design) came to me, please see if they are >> acceptable, or which one is acceptable. Also have some questions. >> >> 1. block guest access during host recovery >> >> add new field error_recovering in struct vfio_pci_device to >> indicate host recovery status. aer driver in host will still do >> reset link >> >> - set error_recovering in vfio-pci driver's error_detected, used to >> block all kinds of user access(config space, mmio) >> - in order to solve concurrent issue of device resetting & user >> access, check device state[*] in vfio-pci driver's resume, see if >> device reset is done, if it is, then clear"error_recovering", or >> else new a timer, check device state periodically until device >> reset is done. (what if device reset don't end for a long time?) >> - In qemu, translate guest link reset to host link reset. >> A question here: we already have link reset in host, is a second >> link reset necessary? why? >> >> [*] how to check device state: reading certain config space >> register, check return value is valid or not(All F's) > > Isn't this exactly the path we were on previously? Yes, it is basically the previous path, plus the optimization. > There might be an > optimization that we could skip back-to-back resets, but how can you > necessarily infer that the resets are for the same thing? If the user > accesses the device between resets, can you still guarantee the guest > directed reset is unnecessary? If time passes between resets, do you > know they're for the same event? How much time can pass between the > host and guest reset to know they're for the same event? In the > process of error handling, which is more important, speed or > correctness? > I think vfio driver itself won't know what each reset comes for, and I don't quite understand why should vfio care this question, is this a new question in the design? But I think it make sense that the user access during 2 resets maybe a trouble for guest recovery, misbehaved user could be out of our imagination. Correctness is more important. If I understand you right, let me make a summary: host recovery just does link reset, which is incomplete, so we'd better do a complete guest recovery for correctness. >> 2. skip link reset in aer driver of host kernel, for vfio-pci. >> Let user decide how to do serious recovery >> >> add new field "user_driver" in struct pci_dev, used to skip link >> reset for vfio-pci; add new field "link_reset" in struct >> vfio_pci_device to indicate link has been reset or not during >> recovery >> >> - set user_driver in vfio_pci_probe(), to skip link reset for >> vfio-pci in host. >> - (use a flag)block user access(config, mmio) during host recovery >> (not sure if this step is necessary) >> - In qemu, translate guest link reset to host link reset. >> - In vfio-pci driver, set link_reset after VFIO_DEVICE_PCI_HOT_RESET >> is executed >> - In vfio-pci driver's resume, new a timer, check "link_reset" field >> periodically, if it is set in reasonable time, then clear it and >> delete timer, or else, vfio-pci driver will does the link reset! > > What happens in the case of a multifunction device where each function > is part of a separate IOMMU group and one function is hot-removed from > the user? We can't do a link reset on that function since the other > function is still in use. We have no choice but release a device in an > unknown state back to the host. hot-remove from user, do you mean, for example, all functions assigned to VM, then suddenly a person does something like following $ echo 0000:06:00.0 > /sys/bus/pci/drivers/vfio-pci/unbind $ echo 0000:06:00.0 > /sys/bus/pci/drivers/igb/bind to return device to host driver, or don't bind it to host driver, let it in driver-less state??? > As previously discussed, we don't > expect that any sort of function-level FLR will necessarily reset the > device to the same state. I also don't really like vfio-pci taking > over error handling capabilities from the PCI-core. That's redundant > code and extra maintenance overhead. > I understand the concern, so I suppose solution 1 is preferred. -- Sincerely, Cao jin >> A quick question: >> I don't know how devices is divided into iommu groups, is it possible >> for functions in a multi-function device to be split into different groups? > > Yes, if a multifunction device supports ACS or if we have quirks to > expose that the functions do not perform internal peer-to-peer, then > they may be in separate IOMMU groups, depending on the rest of the PCI > topology. See: > > http://vfio.blogspot.com/2014/08/iommu-groups-inside-and-out.html > > Thanks, > Alex > > > . > -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html