On 01/21/2017 01:01 AM, Michael S. Tsirkin wrote: > On Fri, Jan 20, 2017 at 06:13:22PM +0800, Cao jin wrote: >> >> >> On 01/20/2017 04:16 AM, Michael S. Tsirkin wrote: >>> This is a design and an initial patch for kernel side for AER >>> support in VFIO. >>> >>> 0. What happens now (PCIE AER only) >>> Fatal errors cause a link reset. >>> Non fatal errors don't. >>> All errors stop the VM eventually, but not immediately >>> because it's detected and reported asynchronously. >>> Interrupts are forwarded as usual. >>> Correctable errors are not reported to guest at all. >>> Note: PPC EEH is different. This focuses on AER. >>> >>> 1. Correctable errors >>> I don't see a need to report these to guest. So let's not. >>> >>> 2. Fatal errors >>> It's not easy to handle them gracefully since link reset >>> is needed. As a first step, let's use the existing mechanism >>> in that case. >>> >>> 2. Non-fatal errors >>> Here we could make progress by reporting them to guest >>> and have guest handle them. >>> Issues: >>> a. this behaviour should only be enabled with new userspace >>> old userspace should work without changes >>> Suggestion: One way to address this would be to add a new eventfd >>> non_fatal_err_trigger. If not set, invoke err_trigger. >>> >>> b. drivers are supposed to stop MMIO when error is reported >>> if vm keeps going, we will keep doing MMIO/config >>> Suggestion 1: ignore this. vm stop happens much later when userspace runs anyway, >>> so we are not making things much worse >>> Suggestion 2: try to stop MMIO/config, resume on resume call >>> >>> Patch below implements Suggestion 1. >>> >>> c. PF driver might detect that function is completely broken, >>> if vm keeps going, we will keep doing MMIO/config >>> Suggestion 1: ignore this. vm stop happens much later when userspace runs anyway, >>> so we are not making things much worse >>> Suggestion 2: detect this and invoke err_trigger to stop VM >>> >>> Patch below implements Suggestion 2. >>> >>> Aside: we currently return PCI_ERS_RESULT_DISCONNECT when device >>> is not attached. This seems bogus, likely based on the confusing name. >>> We probably should return PCI_ERS_RESULT_CAN_RECOVER. >>> >>> The following patch does not change that. >>> >>> Signed-off-by: Michael S. Tsirkin <mst@xxxxxxxxxx> >>> >>> --- >>> >>> The patch is completely untested. Let's discuss the design first. >>> Cao jin, if this is deemed acceptable please take it from here. >>> >> >> Ok, thanks very much. >> >>> diff --git a/drivers/vfio/pci/vfio_pci.c b/drivers/vfio/pci/vfio_pci.c >>> index dce511f..fdca683 100644 >>> --- a/drivers/vfio/pci/vfio_pci.c >>> +++ b/drivers/vfio/pci/vfio_pci.c >>> @@ -1292,7 +1292,9 @@ static pci_ers_result_t vfio_pci_aer_err_detected(struct pci_dev *pdev, >>> >>> mutex_lock(&vdev->igate); >>> >>> - if (vdev->err_trigger) >>> + if (state == pci_channel_io_normal && vdev->non_fatal_err_trigger) >>> + eventfd_signal(vdev->err_trigger, 1); >>> + else if (vdev->err_trigger) >>> eventfd_signal(vdev->err_trigger, 1); >>> >>> mutex_unlock(&vdev->igate); >>> @@ -1302,8 +1304,38 @@ static pci_ers_result_t vfio_pci_aer_err_detected(struct pci_dev *pdev, >>> return PCI_ERS_RESULT_CAN_RECOVER; >>> } >>> >>> +static pci_ers_result_t vfio_pci_aer_slot_reset(struct pci_dev *pdev, >>> + pci_channel_state_t state) >>> +{ >>> + struct vfio_pci_device *vdev; >>> + struct vfio_device *device; >>> + >>> + device = vfio_device_get_from_dev(&pdev->dev); >>> + if (!device) >>> + goto err_dev; >>> + >>> + vdev = vfio_device_data(device); >>> + if (!vdev) >>> + goto err_dev; >>> + >>> + mutex_lock(&vdev->igate); >>> + >>> + if (vdev->err_trigger) >>> + eventfd_signal(vdev->err_trigger, 1); >>> + >>> + mutex_unlock(&vdev->igate); >>> + >>> + vfio_device_put(device); >>> + >>> +err_data: >>> + vfio_device_put(device); >>> +err_dev: >>> + return PCI_ERS_RESULT_RECOVERED; >>> +} >>> + >>> static const struct pci_error_handlers vfio_err_handlers = { >>> .error_detected = vfio_pci_aer_err_detected, >>> + .slot_reset = vfio_pci_aer_slot_reset, >>> }; >>> >> >> if .slot_reset wants to be called, .error_detected should return >> PCI_ERS_RESULT_NEED_RESET, as pci-error-recovery.txt said, so does code. >> >> Is .slot_reset now just a copy of .error_detected and we are going do >> some tricks here? or else don't get why .slot_reset signal user again. > > > No. We do not want a slot reset. But it might be triggered by > another (PF) driver on the slot. If that happens some driver did something > to make devices in the slot go to reset and our driver must recover. > We can't recover however, so let's stop the VM. > > Basically the design is simple > - if you can keep going - do > - if you can't - ask qemu to stop guest > > Don't worry about adding more conditions to trigger reset etc > at this point. Do that with later patches on-top. > It take me for a long while to find what I missed. I missed much details in pci-error-recovery.txt, and I was not conscious of a multi-function device which could have different functions(driver) on it. Thank you both. -- Sincerely, Cao jin >>> static struct pci_driver vfio_pci_driver = { >>> diff --git a/drivers/vfio/pci/vfio_pci_intrs.c b/drivers/vfio/pci/vfio_pci_intrs.c >>> index 1c46045..e883db5 100644 >>> --- a/drivers/vfio/pci/vfio_pci_intrs.c >>> +++ b/drivers/vfio/pci/vfio_pci_intrs.c >>> @@ -611,6 +611,17 @@ static int vfio_pci_set_err_trigger(struct vfio_pci_device *vdev, >>> count, flags, data); >>> } >>> >>> +static int vfio_pci_set_non_fatal_err_trigger(struct vfio_pci_device *vdev, >>> + unsigned index, unsigned start, >>> + unsigned count, uint32_t flags, void *data) >>> +{ >>> + if (index != VFIO_PCI_NON_FATAL_ERR_IRQ_INDEX || start != 0 || count > 1) >>> + return -EINVAL; >>> + >>> + return vfio_pci_set_ctx_trigger_single(&vdev->non_fatal_err_trigger, >>> + count, flags, data); >>> +} >>> + >>> static int vfio_pci_set_req_trigger(struct vfio_pci_device *vdev, >>> unsigned index, unsigned start, >>> unsigned count, uint32_t flags, void *data) >>> @@ -664,6 +675,14 @@ int vfio_pci_set_irqs_ioctl(struct vfio_pci_device *vdev, uint32_t flags, >>> break; >>> } >>> break; >>> + case VFIO_PCI_NON_FATAL_ERR_IRQ_INDEX: >>> + switch (flags & VFIO_IRQ_SET_ACTION_TYPE_MASK) { >>> + case VFIO_IRQ_SET_ACTION_TRIGGER: >>> + if (pci_is_pcie(vdev->pdev)) >>> + func = vfio_pci_set_err_trigger; >> >> s/vfio_pci_set_err_trigger/vfio_pci_set_non_fatal_err_trigger > > Right. > >>> + break; >>> + } >>> + break; >>> case VFIO_PCI_REQ_IRQ_INDEX: >>> switch (flags & VFIO_IRQ_SET_ACTION_TYPE_MASK) { >>> case VFIO_IRQ_SET_ACTION_TRIGGER: >>> diff --git a/drivers/vfio/pci/vfio_pci_private.h b/drivers/vfio/pci/vfio_pci_private.h >>> index f37c73b..c27a507 100644 >>> --- a/drivers/vfio/pci/vfio_pci_private.h >>> +++ b/drivers/vfio/pci/vfio_pci_private.h >>> @@ -93,6 +93,7 @@ struct vfio_pci_device { >>> struct pci_saved_state *pci_saved_state; >>> int refcnt; >>> struct eventfd_ctx *err_trigger; >>> + struct eventfd_ctx *non_fatal_err_trigger; >>> struct eventfd_ctx *req_trigger; >>> struct list_head dummy_resources_list; >>> }; >>> >> >> -- >> Sincerely, >> Cao jin >> > > > . >