On 13/01/17 15:48, David Gibson wrote: > On Thu, Jan 12, 2017 at 10:09:03AM +0100, Greg Kurz wrote: >> On Thu, 12 Jan 2017 17:19:40 +1100 >> Alexey Kardashevskiy <aik@xxxxxxxxx> wrote: >> >>> On 12/01/17 14:52, David Gibson wrote: >>>> On Fri, Jan 06, 2017 at 12:57:58PM +0100, Greg Kurz wrote: >>>>> On Thu, 5 Jan 2017 16:46:18 +1100 >>>>> David Gibson <david@xxxxxxxxxxxxxxxxxxxxx> wrote: >>>>> >>>>>> There was a discussion back in November on the qemu list which spilled >>>>>> onto the libvirt list about how to add support for PCIe devices to >>>>>> POWER VMs, specifically 'pseries' machine type PAPR guests. >>>>>> >>>>>> Here's a more concrete proposal for how to handle part of this in >>>>>> future from the libvirt side. Strictly speaking what I'm suggesting >>>>>> here isn't intrinsically linked to PCIe: it will make adding PCIe >>>>>> support sanely easier, as well as having a number of advantages for >>>>>> both PCIe and plain-PCI devices on PAPR guests. >>>>>> >>>>>> Background: >>>>>> >>>>>> * Currently the pseries machine type only supports vanilla PCI >>>>>> buses. >>>>>> * This is a qemu limitation, not something inherent - PAPR guests >>>>>> running under PowerVM (the IBM hypervisor) can use passthrough >>>>>> PCIe devices (PowerVM doesn't emulate devices though). >>>>>> * In fact the way PCI access is para-virtalized in PAPR makes the >>>>>> usual distinctions between PCI and PCIe largely disappear >>>>>> * Presentation of PCIe devices to PAPR guests is unusual >>>>>> * Unlike x86 - and other "bare metal" platforms, root ports are >>>>>> not made visible to the guest. i.e. all devices (typically) >>>>>> appear as though they were integrated devices on x86 >>>>>> * In terms of topology all devices will appear in a way similar to >>>>>> a vanilla PCI bus, even PCIe devices >>>>>> * However PCIe extended config space is accessible >>>>>> * This means libvirt's usual placement of PCIe devices is not >>>>>> suitable for PAPR guests >>>>>> * PAPR has its own hotplug mechanism >>>>>> * This is used instead of standard PCIe hotplug >>>>>> * This mechanism works for both PCIe and vanilla-PCI devices >>>>>> * This can hotplug/unplug devices even without a root port P2P >>>>>> bridge between it and the root "bus >>>>>> * Multiple independent host bridges are routine on PAPR >>>>>> * Unlike PC (where all host bridges have multiplexed access to >>>>>> configuration space) PCI host bridges (PHBs) are truly >>>>>> independent for PAPR guests (disjoint MMIO regions in system >>>>>> address space) >>>>>> * PowerVM typically presents a separate PHB to the guest for each >>>>>> host slot passed through >>>>>> >>>>>> The Proposal: >>>>>> >>>>>> I suggest that libvirt implement a new default algorithm for placing >>>>>> (i.e. assigning addresses to) both PCI and PCIe devices for (only) >>>>>> PAPR guests. >>>>>> >>>>>> The short summary is that by default it should assign each device to a >>>>>> separate vPHB, creating vPHBs as necessary. >>>>>> >>>>>> * For passthrough sometimes a group of host devices can't be safely >>>>>> isolated from each other - this is known as a (host) Partitionable >>>>>> Endpoint (PE). In this case, if any device in the PE is passed >>>>>> through to a guest, the whole PE must be passed through to the >>>>>> same vPHB in the guest. From the guest POV, each vPHB has exactly >>>>>> one (guest) PE. >>>>>> * To allow for hotplugged devices, libvirt should also add a number >>>>>> of additional, empty vPHBs (the PAPR spec allows for hotplug of >>>>>> PHBs, but this is not yet implemented in qemu). When hotplugging >>>>>> a new device (or PE) libvirt should locate a vPHB which doesn't >>>>>> currently contain anything. >>>>>> * libvirt should only (automatically) add PHBs - never root ports or >>>>>> other PCI to PCI bridges >>>>>> >>>>>> In order to handle migration, the vPHBs will need to be represented in >>>>>> the domain XML, which will also allow the user to override this >>>>>> topology if they want. >>>>>> >>>>>> Advantages: >>>>>> >>>>>> There are still some details I need to figure out w.r.t. handling PCIe >>>>>> devices (on both the qemu and libvirt sides). However the fact that >>>>> >>>>> One such detail may be that PCIe devices should have the >>>>> "ibm,pci-config-space-type" property set to 1 in the DT, >>>>> for the driver to be able to access the extended config >>>>> space. >>>> >>>> So, we have a bit of an oddity here. It looks like we currently set >>>> 'ibm,pci-config-space-type' to 1 in the PHB, rather than individual >>>> device nodes. Which, AFAICT, is simply incorrect in terms of PAPR. >>> >>> >>> I asked Paul how to read the spec and this is rather correct but not enough >>> - having type=1 on a PHB means that extended access requests can go behind >>> it but underlying devices and bridges still need to have type=1 if they >>> support extended space. Having type set to 0 (or none at all) on a PHB >>> would mean that extended config space is not available on anything under >>> this PHB. >>> >> >> I have the very same understanding of the spec (LoPAPR March 2016): >> >> R1–9.1.8–2. All IOAs that implement PCI-X Mode 2 or PCI Express must supply the “ibm,pci-con- >> fig-space-type” property (see Section B.6.5.1.1.1‚ “Properties for Children of PCI Host Bridges‚” on >> page 703). >> >> Implementation Note: The “ibm,pci-config-space-type” property in Requirement R1–9.1.8–2 is added for >> platforms that support I/O fabric and IOAs that implement PCI-X Mode 2, and PCI Express. To access the >> extended configuration space provided by PCI-X Mode 2 and PCI Express, all I/O fabric leading up to an IOA >> must support a 12-bit register number. In other words, if a platform implementation has a conventional PCI bridge >> leading up to an IOA that implements PCI-X Mode 2, the platform will not be able to provide access to the >> extended configuration space of that IOA. The “ibm,config-space-type” property in the IOA's OF node >> is used by device drivers to determine if an IOA’s extended configuration space can be accessed. >> >> and >> >> B.6.5.1.1.1 Properties for Children of PCI Host Bridges >> >> “ibm,pci-config-space-type” >> property name: Indicates if the platform supports access to an extended configuration address space from the PHB >> up to and including this node. >> 0 = Platform supports only an eight bit register number for configuration address space accesses. >> 1 = Platform supports a twelve bit register number for configuration address space accesses. >> This property may be provided in all PHB nodes and their children. >> Note: The absence of this property implies the platform supports only an eight bit register number for configura- >> tion address space accesses. >> >> >> And incidentally, this is what the linux kernel currently expects. See these lines >> from arch/powerpc/kernel/pci_dn.c: >> >> struct pci_dn *pci_add_device_node_info(struct pci_controller *hose, >> struct device_node *dn) >> { >> const __be32 *type = of_get_property(dn, "ibm,pci-config-space-type", NULL); >> . >> . >> . >> /* Extended config space */ >> pdn->pci_ext_config_space = (type && of_read_number(type, 1) == 1); > > Ok, thanks for the information. > >> I had to rework Alexey's "spapr_pci: Create PCI-express root bus by default" >> patch to be able to see the extended config space of a vfio-pci device: > > Ah! Is there an easy command line way to verify that extended config > space is accessible? I do "lspci -vvs "0003:01:00.3 and look for "Capabilities: [xxx v1]" where xxx >= 0x100. -- Alexey
Attachment:
signature.asc
Description: OpenPGP digital signature
-- libvir-list mailing list libvir-list@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/libvir-list