On 12/01/17 14:52, David Gibson wrote: > On Fri, Jan 06, 2017 at 12:57:58PM +0100, Greg Kurz wrote: >> On Thu, 5 Jan 2017 16:46:18 +1100 >> David Gibson <david@xxxxxxxxxxxxxxxxxxxxx> wrote: >> >>> There was a discussion back in November on the qemu list which spilled >>> onto the libvirt list about how to add support for PCIe devices to >>> POWER VMs, specifically 'pseries' machine type PAPR guests. >>> >>> Here's a more concrete proposal for how to handle part of this in >>> future from the libvirt side. Strictly speaking what I'm suggesting >>> here isn't intrinsically linked to PCIe: it will make adding PCIe >>> support sanely easier, as well as having a number of advantages for >>> both PCIe and plain-PCI devices on PAPR guests. >>> >>> Background: >>> >>> * Currently the pseries machine type only supports vanilla PCI >>> buses. >>> * This is a qemu limitation, not something inherent - PAPR guests >>> running under PowerVM (the IBM hypervisor) can use passthrough >>> PCIe devices (PowerVM doesn't emulate devices though). >>> * In fact the way PCI access is para-virtalized in PAPR makes the >>> usual distinctions between PCI and PCIe largely disappear >>> * Presentation of PCIe devices to PAPR guests is unusual >>> * Unlike x86 - and other "bare metal" platforms, root ports are >>> not made visible to the guest. i.e. all devices (typically) >>> appear as though they were integrated devices on x86 >>> * In terms of topology all devices will appear in a way similar to >>> a vanilla PCI bus, even PCIe devices >>> * However PCIe extended config space is accessible >>> * This means libvirt's usual placement of PCIe devices is not >>> suitable for PAPR guests >>> * PAPR has its own hotplug mechanism >>> * This is used instead of standard PCIe hotplug >>> * This mechanism works for both PCIe and vanilla-PCI devices >>> * This can hotplug/unplug devices even without a root port P2P >>> bridge between it and the root "bus >>> * Multiple independent host bridges are routine on PAPR >>> * Unlike PC (where all host bridges have multiplexed access to >>> configuration space) PCI host bridges (PHBs) are truly >>> independent for PAPR guests (disjoint MMIO regions in system >>> address space) >>> * PowerVM typically presents a separate PHB to the guest for each >>> host slot passed through >>> >>> The Proposal: >>> >>> I suggest that libvirt implement a new default algorithm for placing >>> (i.e. assigning addresses to) both PCI and PCIe devices for (only) >>> PAPR guests. >>> >>> The short summary is that by default it should assign each device to a >>> separate vPHB, creating vPHBs as necessary. >>> >>> * For passthrough sometimes a group of host devices can't be safely >>> isolated from each other - this is known as a (host) Partitionable >>> Endpoint (PE). In this case, if any device in the PE is passed >>> through to a guest, the whole PE must be passed through to the >>> same vPHB in the guest. From the guest POV, each vPHB has exactly >>> one (guest) PE. >>> * To allow for hotplugged devices, libvirt should also add a number >>> of additional, empty vPHBs (the PAPR spec allows for hotplug of >>> PHBs, but this is not yet implemented in qemu). When hotplugging >>> a new device (or PE) libvirt should locate a vPHB which doesn't >>> currently contain anything. >>> * libvirt should only (automatically) add PHBs - never root ports or >>> other PCI to PCI bridges >>> >>> In order to handle migration, the vPHBs will need to be represented in >>> the domain XML, which will also allow the user to override this >>> topology if they want. >>> >>> Advantages: >>> >>> There are still some details I need to figure out w.r.t. handling PCIe >>> devices (on both the qemu and libvirt sides). However the fact that >> >> One such detail may be that PCIe devices should have the >> "ibm,pci-config-space-type" property set to 1 in the DT, >> for the driver to be able to access the extended config >> space. > > So, we have a bit of an oddity here. It looks like we currently set > 'ibm,pci-config-space-type' to 1 in the PHB, rather than individual > device nodes. Which, AFAICT, is simply incorrect in terms of PAPR. I asked Paul how to read the spec and this is rather correct but not enough - having type=1 on a PHB means that extended access requests can go behind it but underlying devices and bridges still need to have type=1 if they support extended space. Having type set to 0 (or none at all) on a PHB would mean that extended config space is not available on anything under this PHB. -- Alexey
Attachment:
signature.asc
Description: OpenPGP digital signature
-- libvir-list mailing list libvir-list@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/libvir-list