On Mon, 20 Mar 2017 16:32:33 +0200 "Michael S. Tsirkin" <mst@xxxxxxxxxx> wrote: > On Mon, Mar 20, 2017 at 08:50:39PM +0800, Cao jin wrote: > > Sorry for late. > > > > On 03/14/2017 06:06 AM, Alex Williamson wrote: > > > On Mon, 27 Feb 2017 15:28:43 +0800 > > > Cao jin <caoj.fnst@xxxxxxxxxxxxxx> wrote: > > > > > >> 0. What happens now (PCIE AER only) > > >> Fatal errors cause a link reset. > > >> Non fatal errors don't. > > >> All errors stop the VM eventually, but not immediately > > >> because it's detected and reported asynchronously. > > >> Interrupts are forwarded as usual. > > >> Correctable errors are not reported to guest at all. > > >> Note: PPC EEH is different. This focuses on AER. > > > > > > Perhaps you're only focusing on AER, but don't the error handlers we're > > > using support both AER and EEH generically? I don't think we can > > > completely disregard how this affects EEH behavior, if at all. > > > > > > > After taking a rough look at the EEH, find that EEH always feed > > error_detected with pci_channel_io_frozen, from perspective of > > error_detected, EEH is not affected. > > > > I am not sure about a question: when assign devices in spapr host, > > should all functions/devices in a PE be bound to vfio? I am kind of > > confused about the relationship between a PE & a tce iommu group > > > > >> > > >> 1. Correctable errors > > >> There is no need to report these to guest. So let's not. > > > > > > What does this patch change to make this happen? I don't see > > > anything. Was this always the case? No change? > > > > > > > yes, no change on correctable error. > > > > >> > > >> 2. Fatal errors > > >> It's not easy to handle them gracefully since link reset > > >> is needed. As a first step, let's use the existing mechanism > > >> in that case. > > > > > > Ok, so no change here either. > > > > > >> 2. Non-fatal errors > > >> Here we could make progress by reporting them to guest > > >> and have guest handle them. > > > > > > In practice, what actual errors do we expect userspace to see as > > > non-fatal errors? It would be useful for the commit log to describe > > > the actual benefit we're going to see by splitting out non-fatal errors > > > for the user (not always a guest) to see separately. Justify that this > > > is actually useful. > > > > > >> > > >> Issues: > > >> a. this behaviour should only be enabled with new userspace, > > >> old userspace should work without changes. > > >> > > >> Suggestion: One way to address this would be to add a new eventfd > > >> non_fatal_err_trigger. If not set, invoke err_trigger. > > > > > > This outline format was really more useful for Michael to try to > > > generate discussion, for a commit log, I'd much rather see a definitive > > > statement such as: > > > > > > "To maintain backwards compatibility with userspace, non-fatal errors > > > will continue to trigger via the existing error interrupt index if a > > > non-fatal signaling mechanism has not been registered." > > > > > >> b. drivers are supposed to stop MMIO when error is reported, > > >> if vm keeps going, we will keep doing MMIO/config. > > >> > > >> Suggestion 1: ignore this. vm stop happens much later when > > >> userspace runs anyway, so we are not making things much worse. > > >> > > >> Suggestion 2: try to stop MMIO/config, resume on resume call > > >> > > >> Patch below implements Suggestion 1. > > >> > > >> Note that although this is really against the documentation, which > > >> states error_detected() is the point at which the driver should quiesce > > >> the device and not touch it further (until diagnostic poking at > > >> mmio_enabled or full access at resume callback). > > >> > > >> Fixing this won't be easy. However, this is not a regression. > > >> > > >> Also note this does nothing about interrupts, documentation > > >> suggests returning IRQ_NONE until reset. > > >> Again, not a regression. > > > > > > So again, no change here. I'm not sure what this adds to the commit > > > log, perhaps we can reference this as a link to Michael's original > > > proposal. > > > > > >> c. PF driver might detect that function is completely broken, > > >> if vm keeps going, we will keep doing MMIO/config. > > >> > > >> Suggestion 1: ignore this. vm stop happens much later when > > >> userspace runs anyway, so we are not making things much worse. > > >> > > >> Suggestion 2: detect this and invoke err_trigger to stop VM. > > >> > > >> Patch below implements Suggestion 2. > > > > > > This needs more description and seems a bit misleading. This patch > > > adds a slot_reset handler, such that if the slot is reset, we notify > > > the user, essentially promoting the non-fatal error to fatal. But what > > > condition gets us to this point? AIUI, AER is a voting scheme and if > > > any driver affected says they need a reset, everyone gets a reset. So > > > the PF driver we're talking about here is not vfio-pci and it's not the > > > user, the user has no way to signal that the device is completely > > > broken, this only handles the case of other collateral devices with > > > native host drivers that might signal this, right? > > > > > > > Yes, same understanding as you, if I don't miss something from michael. > > > It seems like this is where this patch has the greatest exposure to > > > regressions. If we take the VM use case, previously we could have a > > > non-AER aware guest and the hypervisor could stop the VM on all > > > errors. Now the hypervisor might support the distinction between fatal > > > and non-fatal, but the guest may still not have AER support. That > > > doesn't imply a problem with this approach, the user (hypervisor) would > > > be at fault for any difference in handling in that case. > > > > > > > >> > > >> +static pci_ers_result_t vfio_pci_aer_slot_reset(struct pci_dev *pdev) > > >> +{ > > >> + struct vfio_pci_device *vdev; > > >> + struct vfio_device *device; > > >> + static pci_ers_result_t err = PCI_ERS_RESULT_NONE; > > >> + > > >> + device = vfio_device_get_from_dev(&pdev->dev); > > >> + if (!device) > > >> + goto err_dev; > > >> + > > >> + vdev = vfio_device_data(device); > > >> + if (!vdev) > > >> + goto err_data; > > >> + > > >> + mutex_lock(&vdev->igate); > > >> + > > >> + if (vdev->err_trigger) > > >> + eventfd_signal(vdev->err_trigger, 1); > > > > > > What about the case where the user has not registered for receiving > > > non-fatal errors, now we send an error signal on both error_detected > > > and slot_reset. Is that useful/desirable? > > > > > > > Not desirable, but seems not harmful, guest user will stop anyway. How > > to avoid this case gracefully seems not easy. > > I actually see a clean way to do this. > > Let's add yet another eventfd to trigger > when hosts resets the device itself. vdev->host_reset ? > > Users can use the same one as err_trigger if they like, > it will be up to them. > > Alex? Sure, the UNIX way, throw more file descriptors at the problem. Kind of ugly, but I don't have a cleaner solution in mind. "host_reset" implies something completely different to me though.