On Fri, Jan 06, 2017 at 12:57:58PM +0100, Greg Kurz wrote: > On Thu, 5 Jan 2017 16:46:18 +1100 > David Gibson <david@xxxxxxxxxxxxxxxxxxxxx> wrote: > > > There was a discussion back in November on the qemu list which spilled > > onto the libvirt list about how to add support for PCIe devices to > > POWER VMs, specifically 'pseries' machine type PAPR guests. > > > > Here's a more concrete proposal for how to handle part of this in > > future from the libvirt side. Strictly speaking what I'm suggesting > > here isn't intrinsically linked to PCIe: it will make adding PCIe > > support sanely easier, as well as having a number of advantages for > > both PCIe and plain-PCI devices on PAPR guests. > > > > Background: > > > > * Currently the pseries machine type only supports vanilla PCI > > buses. > > * This is a qemu limitation, not something inherent - PAPR guests > > running under PowerVM (the IBM hypervisor) can use passthrough > > PCIe devices (PowerVM doesn't emulate devices though). > > * In fact the way PCI access is para-virtalized in PAPR makes the > > usual distinctions between PCI and PCIe largely disappear > > * Presentation of PCIe devices to PAPR guests is unusual > > * Unlike x86 - and other "bare metal" platforms, root ports are > > not made visible to the guest. i.e. all devices (typically) > > appear as though they were integrated devices on x86 > > * In terms of topology all devices will appear in a way similar to > > a vanilla PCI bus, even PCIe devices > > * However PCIe extended config space is accessible > > * This means libvirt's usual placement of PCIe devices is not > > suitable for PAPR guests > > * PAPR has its own hotplug mechanism > > * This is used instead of standard PCIe hotplug > > * This mechanism works for both PCIe and vanilla-PCI devices > > * This can hotplug/unplug devices even without a root port P2P > > bridge between it and the root "bus > > * Multiple independent host bridges are routine on PAPR > > * Unlike PC (where all host bridges have multiplexed access to > > configuration space) PCI host bridges (PHBs) are truly > > independent for PAPR guests (disjoint MMIO regions in system > > address space) > > * PowerVM typically presents a separate PHB to the guest for each > > host slot passed through > > > > The Proposal: > > > > I suggest that libvirt implement a new default algorithm for placing > > (i.e. assigning addresses to) both PCI and PCIe devices for (only) > > PAPR guests. > > > > The short summary is that by default it should assign each device to a > > separate vPHB, creating vPHBs as necessary. > > > > * For passthrough sometimes a group of host devices can't be safely > > isolated from each other - this is known as a (host) Partitionable > > Endpoint (PE). In this case, if any device in the PE is passed > > through to a guest, the whole PE must be passed through to the > > same vPHB in the guest. From the guest POV, each vPHB has exactly > > one (guest) PE. > > * To allow for hotplugged devices, libvirt should also add a number > > of additional, empty vPHBs (the PAPR spec allows for hotplug of > > PHBs, but this is not yet implemented in qemu). When hotplugging > > a new device (or PE) libvirt should locate a vPHB which doesn't > > currently contain anything. > > * libvirt should only (automatically) add PHBs - never root ports or > > other PCI to PCI bridges > > > > In order to handle migration, the vPHBs will need to be represented in > > the domain XML, which will also allow the user to override this > > topology if they want. > > > > Advantages: > > > > There are still some details I need to figure out w.r.t. handling PCIe > > devices (on both the qemu and libvirt sides). However the fact that > > One such detail may be that PCIe devices should have the > "ibm,pci-config-space-type" property set to 1 in the DT, > for the driver to be able to access the extended config > space. Right. > > PAPR guests don't typically see PCIe root ports means that the normal > > libvirt PCIe allocation scheme won't work. This scheme has several > > advantages with or without support for PCIe devices: > > > > * Better performance for 32-bit devices > > > > With multiple devices on a single vPHB they all must share a (fairly > > small) 32-bit DMA/IOMMU window. With separate PHBs they each have a > > separate window. PAPR guests have an always-on guest visible IOMMU. > > > > * Better EEH handling for passthrough devices > > > > EEH is an IBM hardware-assisted mechanism for isolating and safely > > resetting devices experiencing hardware faults so they don't bring > > down other devices or the system at large. It's roughly similar to > > PCIe AER in concept, but has a different IBM specific interface, and > > works on both PCI and PCIe devices. > > > > Currently the kernel interfaces for handling EEH events on passthrough > > devices will only work if there is a single (host) iommu group in the > > vfio container. While lifting that restriction would be nice, it's > > quite difficult to do so (it requires keeping state synchronized > > between multiple host groups). That also means that an EEH error on > > one device could stop another device where that isn't required by the > > actual hardware. > > > > The unit of EEH isolation is a PE (Partitionable Endpoint) and > > currently there is only one guest PE per vPHB. Changing this might > > also be possible, but is again quite complex and may result in > > confusing and/or broken distinctions between groups for EEH isolation > > and IOMMU isolation purposes. > > > > Placing separate host groups in separate vPHBs sidesteps these > > problems. > > > > * Guest NUMA node assignment of devices > > > > PAPR does not (and can't reasonably) use the pxb device. Instead to > > allocate devices to different guest NUMA nodes they should be placed > > on different vPHBs. Placing them on different PHBs by default allows > > NUMA node to be assigned to those PHBs in a straightforward manner. > > > -- David Gibson | I'll have my music baroque, and my code david AT gibson.dropbear.id.au | minimalist, thank you. NOT _the_ _other_ | _way_ _around_! http://www.ozlabs.org/~dgibson
Attachment:
signature.asc
Description: PGP signature
-- libvir-list mailing list libvir-list@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/libvir-list