David, I don't see the "spapr_pci: Allow PCI-Express devices" patch in your ppc-for-2.9 tree. Do you still consider merging it ? Cheers. -- Greg On Fri, 13 Jan 2017 09:58:28 +0100 Greg Kurz <groug@xxxxxxxx> wrote: > On Fri, 13 Jan 2017 15:48:31 +1100 > David Gibson <david@xxxxxxxxxxxxxxxxxxxxx> wrote: > > > On Thu, Jan 12, 2017 at 10:09:03AM +0100, Greg Kurz wrote: > > > On Thu, 12 Jan 2017 17:19:40 +1100 > > > Alexey Kardashevskiy <aik@xxxxxxxxx> wrote: > > > > > > > On 12/01/17 14:52, David Gibson wrote: > > > > > On Fri, Jan 06, 2017 at 12:57:58PM +0100, Greg Kurz wrote: > > > > >> On Thu, 5 Jan 2017 16:46:18 +1100 > > > > >> David Gibson <david@xxxxxxxxxxxxxxxxxxxxx> wrote: > > > > >> > > > > >>> There was a discussion back in November on the qemu list which spilled > > > > >>> onto the libvirt list about how to add support for PCIe devices to > > > > >>> POWER VMs, specifically 'pseries' machine type PAPR guests. > > > > >>> > > > > >>> Here's a more concrete proposal for how to handle part of this in > > > > >>> future from the libvirt side. Strictly speaking what I'm suggesting > > > > >>> here isn't intrinsically linked to PCIe: it will make adding PCIe > > > > >>> support sanely easier, as well as having a number of advantages for > > > > >>> both PCIe and plain-PCI devices on PAPR guests. > > > > >>> > > > > >>> Background: > > > > >>> > > > > >>> * Currently the pseries machine type only supports vanilla PCI > > > > >>> buses. > > > > >>> * This is a qemu limitation, not something inherent - PAPR guests > > > > >>> running under PowerVM (the IBM hypervisor) can use passthrough > > > > >>> PCIe devices (PowerVM doesn't emulate devices though). > > > > >>> * In fact the way PCI access is para-virtalized in PAPR makes the > > > > >>> usual distinctions between PCI and PCIe largely disappear > > > > >>> * Presentation of PCIe devices to PAPR guests is unusual > > > > >>> * Unlike x86 - and other "bare metal" platforms, root ports are > > > > >>> not made visible to the guest. i.e. all devices (typically) > > > > >>> appear as though they were integrated devices on x86 > > > > >>> * In terms of topology all devices will appear in a way similar to > > > > >>> a vanilla PCI bus, even PCIe devices > > > > >>> * However PCIe extended config space is accessible > > > > >>> * This means libvirt's usual placement of PCIe devices is not > > > > >>> suitable for PAPR guests > > > > >>> * PAPR has its own hotplug mechanism > > > > >>> * This is used instead of standard PCIe hotplug > > > > >>> * This mechanism works for both PCIe and vanilla-PCI devices > > > > >>> * This can hotplug/unplug devices even without a root port P2P > > > > >>> bridge between it and the root "bus > > > > >>> * Multiple independent host bridges are routine on PAPR > > > > >>> * Unlike PC (where all host bridges have multiplexed access to > > > > >>> configuration space) PCI host bridges (PHBs) are truly > > > > >>> independent for PAPR guests (disjoint MMIO regions in system > > > > >>> address space) > > > > >>> * PowerVM typically presents a separate PHB to the guest for each > > > > >>> host slot passed through > > > > >>> > > > > >>> The Proposal: > > > > >>> > > > > >>> I suggest that libvirt implement a new default algorithm for placing > > > > >>> (i.e. assigning addresses to) both PCI and PCIe devices for (only) > > > > >>> PAPR guests. > > > > >>> > > > > >>> The short summary is that by default it should assign each device to a > > > > >>> separate vPHB, creating vPHBs as necessary. > > > > >>> > > > > >>> * For passthrough sometimes a group of host devices can't be safely > > > > >>> isolated from each other - this is known as a (host) Partitionable > > > > >>> Endpoint (PE). In this case, if any device in the PE is passed > > > > >>> through to a guest, the whole PE must be passed through to the > > > > >>> same vPHB in the guest. From the guest POV, each vPHB has exactly > > > > >>> one (guest) PE. > > > > >>> * To allow for hotplugged devices, libvirt should also add a number > > > > >>> of additional, empty vPHBs (the PAPR spec allows for hotplug of > > > > >>> PHBs, but this is not yet implemented in qemu). When hotplugging > > > > >>> a new device (or PE) libvirt should locate a vPHB which doesn't > > > > >>> currently contain anything. > > > > >>> * libvirt should only (automatically) add PHBs - never root ports or > > > > >>> other PCI to PCI bridges > > > > >>> > > > > >>> In order to handle migration, the vPHBs will need to be represented in > > > > >>> the domain XML, which will also allow the user to override this > > > > >>> topology if they want. > > > > >>> > > > > >>> Advantages: > > > > >>> > > > > >>> There are still some details I need to figure out w.r.t. handling PCIe > > > > >>> devices (on both the qemu and libvirt sides). However the fact that > > > > >> > > > > >> One such detail may be that PCIe devices should have the > > > > >> "ibm,pci-config-space-type" property set to 1 in the DT, > > > > >> for the driver to be able to access the extended config > > > > >> space. > > > > > > > > > > So, we have a bit of an oddity here. It looks like we currently set > > > > > 'ibm,pci-config-space-type' to 1 in the PHB, rather than individual > > > > > device nodes. Which, AFAICT, is simply incorrect in terms of PAPR. > > > > > > > > > > > > I asked Paul how to read the spec and this is rather correct but not enough > > > > - having type=1 on a PHB means that extended access requests can go behind > > > > it but underlying devices and bridges still need to have type=1 if they > > > > support extended space. Having type set to 0 (or none at all) on a PHB > > > > would mean that extended config space is not available on anything under > > > > this PHB. > > > > > > > > > > I have the very same understanding of the spec (LoPAPR March 2016): > > > > > > R1–9.1.8–2. All IOAs that implement PCI-X Mode 2 or PCI Express must supply the “ibm,pci-con- > > > fig-space-type” property (see Section B.6.5.1.1.1‚ “Properties for Children of PCI Host Bridges‚” on > > > page 703). > > > > > > Implementation Note: The “ibm,pci-config-space-type” property in Requirement R1–9.1.8–2 is added for > > > platforms that support I/O fabric and IOAs that implement PCI-X Mode 2, and PCI Express. To access the > > > extended configuration space provided by PCI-X Mode 2 and PCI Express, all I/O fabric leading up to an IOA > > > must support a 12-bit register number. In other words, if a platform implementation has a conventional PCI bridge > > > leading up to an IOA that implements PCI-X Mode 2, the platform will not be able to provide access to the > > > extended configuration space of that IOA. The “ibm,config-space-type” property in the IOA's OF node > > > is used by device drivers to determine if an IOA’s extended configuration space can be accessed. > > > > > > and > > > > > > B.6.5.1.1.1 Properties for Children of PCI Host Bridges > > > > > > “ibm,pci-config-space-type” > > > property name: Indicates if the platform supports access to an extended configuration address space from the PHB > > > up to and including this node. > > > 0 = Platform supports only an eight bit register number for configuration address space accesses. > > > 1 = Platform supports a twelve bit register number for configuration address space accesses. > > > This property may be provided in all PHB nodes and their children. > > > Note: The absence of this property implies the platform supports only an eight bit register number for configura- > > > tion address space accesses. > > > > > > > > > And incidentally, this is what the linux kernel currently expects. See these lines > > > from arch/powerpc/kernel/pci_dn.c: > > > > > > struct pci_dn *pci_add_device_node_info(struct pci_controller *hose, > > > struct device_node *dn) > > > { > > > const __be32 *type = of_get_property(dn, "ibm,pci-config-space-type", NULL); > > > . > > > . > > > . > > > /* Extended config space */ > > > pdn->pci_ext_config_space = (type && of_read_number(type, 1) == 1); > > > > Ok, thanks for the information. > > > > > I had to rework Alexey's "spapr_pci: Create PCI-express root bus by default" > > > patch to be able to see the extended config space of a vfio-pci device: > > > > Ah! Is there an easy command line way to verify that extended config > > space is accessible? > > > > 'lspci -vv' prints the capabilities from the config space with their > hex offset between [ ]. Extended config space starts at offset 0x100. > > > > --- a/hw/ppc/spapr_pci.c > > > +++ b/hw/ppc/spapr_pci.c > > > @@ -1052,6 +1052,9 @@ static int spapr_populate_pci_child_dt(PCIDevice *dev, void *fdt, int offset, > > > _FDT(fdt_setprop(fdt, offset, "reg", (uint8_t *)rp.reg, rp.reg_len)); > > > _FDT(fdt_setprop(fdt, offset, "assigned-addresses", > > > (uint8_t *)rp.assigned, rp.assigned_len)); > > > + if (sphb->pcie_root && pci_is_express(dev)) { > > > + _FDT(fdt_setprop_cell(fdt, offset, "ibm,pci-config-space-type", 0x1)); > > > + } > > > > > > return 0; > > > } > > > > I'm looking at merging the patch below into ppc-for-2.9 shortly. It > > basically combines a revised version of the pcie option part of > > Alexey's patch with your DT patch above. For now it only allows PCIe > > with an explicit option, changing the default waits on more > > investigation of how to not break things with libvirt. > > > > From e14702a0bb76900077d82510280e1a023c384b72 Mon Sep 17 00:00:00 2001 > > From: David Gibson <david@xxxxxxxxxxxxxxxxxxxxx> > > Date: Fri, 13 Jan 2017 15:45:30 +1100 > > Subject: [PATCH] spapr_pci: Allow PCI-Express devices > > > > Currently the PCI host bridge (PHB) for the pseries machine type always > > creates a plain PCI bus (TYPE_PCI_BUS). This means that qemu won't allow > > PCIe devices to be attached, and consequently, PCIe devices can't be used > > on pseries. > > > > Well, PCIe devices can still be attached actually (at least I could attach > a vfio-pci device linked to a PCIe device on the host). But, they appear > as legacy PCI devices in the guest, without the extended config space. > > > This limitation isn't inherent in the PAPR specification. In fact, because > > access to the PCI bus is paravirtualized via hypercalls, from the guest > > point of view there is very little difference between a plain PCI and PCIe > > bus under PAPR. > > > > So, to allow PCIe devices, add a "pcie" option to the PHB device, which > > when enabled creates a PCIe bus. In addition, on PCIe devices under such > > a PHB, we publish the device tree flag which indicates that the guest may > > access PCIe extended config space (via RTAS calls which we've already > > implemented). > > > > Note that this doesn't change the default PHB to PCIe. We probably want to > > do that in future, but we need to sort out some possible breakages with > > libvirt and existing users first. This at least lets knowledgeable users > > experiment with PCIe devices in the meantime. > > > > Signed-off-by: David Gibson <david@xxxxxxxxxxxxxxxxxxxxx> > > FWIW, > > Reviewed-by: Greg Kurz <groug@xxxxxxxx> > > > --- > > hw/ppc/spapr_pci.c | 8 +++++++- > > include/hw/pci-host/spapr.h | 1 + > > 2 files changed, 8 insertions(+), 1 deletion(-) > > > > diff --git a/hw/ppc/spapr_pci.c b/hw/ppc/spapr_pci.c > > index fd6fc1d..d17df1f 100644 > > --- a/hw/ppc/spapr_pci.c > > +++ b/hw/ppc/spapr_pci.c > > @@ -1053,6 +1053,10 @@ static int spapr_populate_pci_child_dt(PCIDevice *dev, void *fdt, int offset, > > _FDT(fdt_setprop(fdt, offset, "assigned-addresses", > > (uint8_t *)rp.assigned, rp.assigned_len)); > > > > + if (sphb->pcie && pci_is_express(dev)) { > > + _FDT(fdt_setprop_cell(fdt, offset, "ibm,pci-config-space-type", 0x1)); > > + } > > + > > return 0; > > } > > > > @@ -1434,7 +1438,8 @@ static void spapr_phb_realize(DeviceState *dev, Error **errp) > > bus = pci_register_bus(dev, NULL, > > pci_spapr_set_irq, pci_spapr_map_irq, sphb, > > &sphb->memspace, &sphb->iospace, > > - PCI_DEVFN(0, 0), PCI_NUM_PINS, TYPE_PCI_BUS); > > + PCI_DEVFN(0, 0), PCI_NUM_PINS, > > + sphb->pcie ? TYPE_PCIE_BUS : TYPE_PCI_BUS); > > phb->bus = bus; > > qbus_set_hotplug_handler(BUS(phb->bus), DEVICE(sphb), NULL); > > > > @@ -1592,6 +1597,7 @@ static Property spapr_phb_properties[] = { > > DEFINE_PROP_UINT32("numa_node", sPAPRPHBState, numa_node, -1), > > DEFINE_PROP_BOOL("pre-2.8-migration", sPAPRPHBState, > > pre_2_8_migration, false), > > + DEFINE_PROP_BOOL("pcie", sPAPRPHBState, pcie, false), > > DEFINE_PROP_END_OF_LIST(), > > }; > > > > diff --git a/include/hw/pci-host/spapr.h b/include/hw/pci-host/spapr.h > > index 092294e..deed2cc 100644 > > --- a/include/hw/pci-host/spapr.h > > +++ b/include/hw/pci-host/spapr.h > > @@ -79,6 +79,7 @@ struct sPAPRPHBState { > > uint64_t dma64_win_addr; > > > > uint32_t numa_node; > > + bool pcie; > > > > /* Fields for migration compatibility hacks */ > > bool pre_2_8_migration; >
Attachment:
pgpeTg14ifk22.pgp
Description: OpenPGP digital signature
-- libvir-list mailing list libvir-list@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/libvir-list