EEH Support for VFIO PCI Device The series of patches adds support EEH for PCI devices, which are passed through to PowerKVM based guest via VFIO. The implementation is straightforward based on the issues or problems we have to resolve to support EEH for PowerKVM based guest. - Emulation for EEH RTAS requests. All EEH RTAS requests goes to QEMU firstly. If QEMU can't handle it, the request will be sent to host via newly introduced VFIO container IOCTL command (VFIO_EEH_OP) and gets handled in host kernel. The series of patches requires corresponding QEMU changes. Change log ========== v1 -> v2: * EEH RTAS requests are routed to QEMU, and then possiblly to host kerenl. The mechanism KVM in-kernel handling is dropped. * Error injection is reimplemented based syscall, instead of KVM in-kerenl handling. The logic for error injection token management is moved to QEMU. The error injection request is routed to QEMU and then possiblly to host kernel. v2 -> v3: * Make the fields in struct eeh_vfio_pci_addr, struct vfio_eeh_info based on the comments from Alexey. * Define macros for EEH VFIO operations (Alexey). * Clear frozen state after successful PE reset. * Merge original [PATCH 1/2/3] to one. v3 -> v4: * Remove the error injection from the patchset. Mike or I will work on that later. * Rename CONFIG_VFIO_EEH to VFIO_PCI_EEH. * Rename the IOCTL command to VFIO_EEH_OP and it's handled by VFIO-PCI device instead of VFIO container. * Rename the IOCTL argument structure to "vfio_eeh_op" accordingly. Also, more fields added to hold return values for RTAS requests. * The address mapping stuff is totally removed. When opening or releasing VFIO PCI device, notification sent to EEH to update the flags indicates the device is passed to guest or not. * Change pr_warn() to pr_debug() to avoid DOS as pointed by Alex.W * Argument size check issue pointed by Alex.W. v4 -> v5: * Functions for VFIO PCI EEH support are moved to eeh.c and exported from there. VFIO PCI driver just uses those functions to tackle IOCTL command VFIO_EEH_OP. All of this is to make the code organized in a good way as suggested by Alex.G. Another potential benefit is PowerNV/pSeries are sharing "eeh_ops" and same infrastructure could possiblly work for KVM_PR and KVM_HV mode at the same time. * Don't clear error injection registers after finishing PE reset as the patchset is doing nothing related to error injection. * Amending Documentation/vfio.txt, which was missed in last revision. * No QEMU changes for this revision. "v4" works well. Also, remove "RFC" from the subject as the design is basically recognized. v5 -> v6: * CONFIG_VFIO_PCI_EEH removed. Instead to use CONFIG_EEH. * Split one ioctl command to 5. * In eeh.c, description has been added for those exported functions. Also, the functions have negative return values for error and information with other values. All digital numbers have been replaced by macros defined in eeh.h. The comments, including the function names have been amended not to mention "guest" or "vfio". * Add one mutex to protect flag in eeh_dev_open()/release(). * More information on how to use those ioctl commands to Documentation/vfio.txt. v6 -> v7: * Remove ioctl command VFIO_EEH_PE_GET_ADDR, the PE address will be figured out in userland (e.g. QEMU) as Alex.G suggested. * Let sPAPR VFIO container process the ioctl commands as VFIO container is naturally corresponds to IOMMU group (aka PE on sPAPR platform). * All VFIO PCI EEH ioctl commands have "argsz+flags" for its companion data struct. * For VFIO PCI EEH ioctl commands, ioctl() returns negative number to indicate error or zero for success. Additinal output information is transported by the companion data struct. * Explaining PE in Documentation/vfio.txt, typo fixes, more comments suggested by Alex.G. * Split/merge patches according to suggestions from Alex.G and Alex.W. * To have EEH stub in drivers/vfio/pci/, which was suggested by Alex.W. * Define various EEH options as macros in vfio.h for userland to use. v7 -> v8: * Change ioctl commands back to combined one. * EEH related logic was put into drivers/vfio/vfio_eeh.c, which is only built with CONFIG_EEH. Otherwise, inline functions defined in include/linux/vfio.h * Change vfio.txt according to the source code changes. * Fix various comments from internal reviews by Alexey. Thanks to Alexey. v8 -> v9: * Remove unused macros in asm/include/eeh.h * Missed to disable VFIO device on error from vfio_spapr_pci_eeh_open(). * Don't include unused header files in drivers/vfio/vfio_spapr_eeh.c * Define inline PE state for VFIO_EEH_PE_GET_STATE. v9 -> v10: * Make sure we have zero struct vfio_eeh_pe_op::flags Gavin Shan (3): powerpc/eeh: Avoid event on passed PE powerpc/eeh: EEH support for VFIO PCI device drivers/vfio: EEH support for VFIO PCI device Documentation/vfio.txt | 87 +++++++++- arch/powerpc/include/asm/eeh.h | 19 ++ arch/powerpc/kernel/eeh.c | 276 ++++++++++++++++++++++++++++++ arch/powerpc/platforms/powernv/eeh-ioda.c | 3 +- drivers/vfio/Makefile | 1 + drivers/vfio/pci/vfio_pci.c | 18 +- drivers/vfio/vfio_iommu_spapr_tce.c | 17 +- drivers/vfio/vfio_spapr_eeh.c | 87 ++++++++++ include/linux/vfio.h | 23 +++ include/uapi/linux/vfio.h | 34 ++++ 10 files changed, 556 insertions(+), 9 deletions(-) create mode 100644 drivers/vfio/vfio_spapr_eeh.c -- 1.8.3.2 -- To unsubscribe from this list: send the line "unsubscribe kvm-ppc" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html