On Mon, 12 Dec 2016 21:49:01 +0800 Cao jin <caoj.fnst@xxxxxxxxxxxxxx> wrote: > Hi, > I have 2 solutions(high level design) came to me, please see if they are > acceptable, or which one is acceptable. Also have some questions. > > 1. block guest access during host recovery > > add new field error_recovering in struct vfio_pci_device to > indicate host recovery status. aer driver in host will still do > reset link > > - set error_recovering in vfio-pci driver's error_detected, used to > block all kinds of user access(config space, mmio) > - in order to solve concurrent issue of device resetting & user > access, check device state[*] in vfio-pci driver's resume, see if > device reset is done, if it is, then clear"error_recovering", or > else new a timer, check device state periodically until device > reset is done. (what if device reset don't end for a long time?) > - In qemu, translate guest link reset to host link reset. > A question here: we already have link reset in host, is a second > link reset necessary? why? > > [*] how to check device state: reading certain config space > register, check return value is valid or not(All F's) Isn't this exactly the path we were on previously? There might be an optimization that we could skip back-to-back resets, but how can you necessarily infer that the resets are for the same thing? If the user accesses the device between resets, can you still guarantee the guest directed reset is unnecessary? If time passes between resets, do you know they're for the same event? How much time can pass between the host and guest reset to know they're for the same event? In the process of error handling, which is more important, speed or correctness? > 2. skip link reset in aer driver of host kernel, for vfio-pci. > Let user decide how to do serious recovery > > add new field "user_driver" in struct pci_dev, used to skip link > reset for vfio-pci; add new field "link_reset" in struct > vfio_pci_device to indicate link has been reset or not during > recovery > > - set user_driver in vfio_pci_probe(), to skip link reset for > vfio-pci in host. > - (use a flag)block user access(config, mmio) during host recovery > (not sure if this step is necessary) > - In qemu, translate guest link reset to host link reset. > - In vfio-pci driver, set link_reset after VFIO_DEVICE_PCI_HOT_RESET > is executed > - In vfio-pci driver's resume, new a timer, check "link_reset" field > periodically, if it is set in reasonable time, then clear it and > delete timer, or else, vfio-pci driver will does the link reset! What happens in the case of a multifunction device where each function is part of a separate IOMMU group and one function is hot-removed from the user? We can't do a link reset on that function since the other function is still in use. We have no choice but release a device in an unknown state back to the host. As previously discussed, we don't expect that any sort of function-level FLR will necessarily reset the device to the same state. I also don't really like vfio-pci taking over error handling capabilities from the PCI-core. That's redundant code and extra maintenance overhead. > A quick question: > I don't know how devices is divided into iommu groups, is it possible > for functions in a multi-function device to be split into different groups? Yes, if a multifunction device supports ACS or if we have quirks to expose that the functions do not perform internal peer-to-peer, then they may be in separate IOMMU groups, depending on the rest of the PCI topology. See: http://vfio.blogspot.com/2014/08/iommu-groups-inside-and-out.html Thanks, Alex -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html