On Mon, Mar 20, 2017 at 08:50:39PM +0800, Cao jin wrote: > Sorry for late. > > On 03/14/2017 06:06 AM, Alex Williamson wrote: > > On Mon, 27 Feb 2017 15:28:43 +0800 > > Cao jin <caoj.fnst@xxxxxxxxxxxxxx> wrote: > > > >> 0. What happens now (PCIE AER only) > >> Fatal errors cause a link reset. > >> Non fatal errors don't. > >> All errors stop the VM eventually, but not immediately > >> because it's detected and reported asynchronously. > >> Interrupts are forwarded as usual. > >> Correctable errors are not reported to guest at all. > >> Note: PPC EEH is different. This focuses on AER. > > > > Perhaps you're only focusing on AER, but don't the error handlers we're > > using support both AER and EEH generically? I don't think we can > > completely disregard how this affects EEH behavior, if at all. > > > > After taking a rough look at the EEH, find that EEH always feed > error_detected with pci_channel_io_frozen, from perspective of > error_detected, EEH is not affected. > > I am not sure about a question: when assign devices in spapr host, > should all functions/devices in a PE be bound to vfio? I am kind of > confused about the relationship between a PE & a tce iommu group > > >> > >> 1. Correctable errors > >> There is no need to report these to guest. So let's not. > > > > What does this patch change to make this happen? I don't see > > anything. Was this always the case? No change? > > > > yes, no change on correctable error. > > >> > >> 2. Fatal errors > >> It's not easy to handle them gracefully since link reset > >> is needed. As a first step, let's use the existing mechanism > >> in that case. > > > > Ok, so no change here either. > > > >> 2. Non-fatal errors > >> Here we could make progress by reporting them to guest > >> and have guest handle them. > > > > In practice, what actual errors do we expect userspace to see as > > non-fatal errors? It would be useful for the commit log to describe > > the actual benefit we're going to see by splitting out non-fatal errors > > for the user (not always a guest) to see separately. Justify that this > > is actually useful. > > > >> > >> Issues: > >> a. this behaviour should only be enabled with new userspace, > >> old userspace should work without changes. > >> > >> Suggestion: One way to address this would be to add a new eventfd > >> non_fatal_err_trigger. If not set, invoke err_trigger. > > > > This outline format was really more useful for Michael to try to > > generate discussion, for a commit log, I'd much rather see a definitive > > statement such as: > > > > "To maintain backwards compatibility with userspace, non-fatal errors > > will continue to trigger via the existing error interrupt index if a > > non-fatal signaling mechanism has not been registered." > > > >> b. drivers are supposed to stop MMIO when error is reported, > >> if vm keeps going, we will keep doing MMIO/config. > >> > >> Suggestion 1: ignore this. vm stop happens much later when > >> userspace runs anyway, so we are not making things much worse. > >> > >> Suggestion 2: try to stop MMIO/config, resume on resume call > >> > >> Patch below implements Suggestion 1. > >> > >> Note that although this is really against the documentation, which > >> states error_detected() is the point at which the driver should quiesce > >> the device and not touch it further (until diagnostic poking at > >> mmio_enabled or full access at resume callback). > >> > >> Fixing this won't be easy. However, this is not a regression. > >> > >> Also note this does nothing about interrupts, documentation > >> suggests returning IRQ_NONE until reset. > >> Again, not a regression. > > > > So again, no change here. I'm not sure what this adds to the commit > > log, perhaps we can reference this as a link to Michael's original > > proposal. > > > >> c. PF driver might detect that function is completely broken, > >> if vm keeps going, we will keep doing MMIO/config. > >> > >> Suggestion 1: ignore this. vm stop happens much later when > >> userspace runs anyway, so we are not making things much worse. > >> > >> Suggestion 2: detect this and invoke err_trigger to stop VM. > >> > >> Patch below implements Suggestion 2. > > > > This needs more description and seems a bit misleading. This patch > > adds a slot_reset handler, such that if the slot is reset, we notify > > the user, essentially promoting the non-fatal error to fatal. But what > > condition gets us to this point? AIUI, AER is a voting scheme and if > > any driver affected says they need a reset, everyone gets a reset. So > > the PF driver we're talking about here is not vfio-pci and it's not the > > user, the user has no way to signal that the device is completely > > broken, this only handles the case of other collateral devices with > > native host drivers that might signal this, right? > > > > Yes, same understanding as you, if I don't miss something from michael. > > It seems like this is where this patch has the greatest exposure to > > regressions. If we take the VM use case, previously we could have a > > non-AER aware guest and the hypervisor could stop the VM on all > > errors. Now the hypervisor might support the distinction between fatal > > and non-fatal, but the guest may still not have AER support. That > > doesn't imply a problem with this approach, the user (hypervisor) would > > be at fault for any difference in handling in that case. > > > > >> > >> +static pci_ers_result_t vfio_pci_aer_slot_reset(struct pci_dev *pdev) > >> +{ > >> + struct vfio_pci_device *vdev; > >> + struct vfio_device *device; > >> + static pci_ers_result_t err = PCI_ERS_RESULT_NONE; > >> + > >> + device = vfio_device_get_from_dev(&pdev->dev); > >> + if (!device) > >> + goto err_dev; > >> + > >> + vdev = vfio_device_data(device); > >> + if (!vdev) > >> + goto err_data; > >> + > >> + mutex_lock(&vdev->igate); > >> + > >> + if (vdev->err_trigger) > >> + eventfd_signal(vdev->err_trigger, 1); > > > > What about the case where the user has not registered for receiving > > non-fatal errors, now we send an error signal on both error_detected > > and slot_reset. Is that useful/desirable? > > > > Not desirable, but seems not harmful, guest user will stop anyway. How > to avoid this case gracefully seems not easy. I actually see a clean way to do this. Let's add yet another eventfd to trigger when hosts resets the device itself. vdev->host_reset ? Users can use the same one as err_trigger if they like, it will be up to them. Alex? > -- > Sincerely, > Cao jin > > > >