On Tue, 2016-11-01 at 13:46 +1100, David Gibson wrote: > On Mon, Oct 31, 2016 at 03:10:23PM +1100, Alexey Kardashevskiy wrote: > > > > On 31/10/16 13:53, David Gibson wrote: > > > > > > On Fri, Oct 28, 2016 at 12:07:12PM +0200, Greg Kurz wrote: > > > > > > > > On Fri, 28 Oct 2016 18:56:40 +1100 > > > > Alexey Kardashevskiy <aik@xxxxxxxxx> wrote: > > > > > > > > > > > > > > At the moment sPAPR PHB creates a root buf of TYPE_PCI_BUS type. > > > > > This means that vfio-pci devices attached to it (and this is > > > > > a default behaviour) hide PCIe extended capabilities as > > > > > the bus does not pass a pci_bus_is_express(pdev->bus) check. > > > > > > > > > > This changes adds a default PCI bus type property to sPAPR PHB > > > > > and uses TYPE_PCIE_BUS if none passed; older machines get TYPE_PCI_BUS > > > > > for backward compatibility as a bus type is used in the bus name > > > > > so the root bus name becomes "pcie.0" instead of "pci.0". > > > > > > > > > > Signed-off-by: Alexey Kardashevskiy <aik@xxxxxxxxx> > > > > > --- > > > > > > > > > > What can possibly go wrong with such change of a name? > > > > > From devices prospective, I cannot see any. > > > > > > > > > > libvirt might get upset as "pci.0" will not be available, > > > > > will it make sense to create pcie.0 as a root bus and always > > > > > add a PCIe->PCI bridge and name its bus "pci.0"? > > > > > > > > > > Or create root bus from TYPE_PCIE_BUS and force name to "pci.0"? > > > > > pci_register_bus() can do this. > > > > > > > > > > > > > > > --- > > > > > hw/ppc/spapr.c | 5 +++++ > > > > > hw/ppc/spapr_pci.c | 5 ++++- > > > > > include/hw/pci-host/spapr.h | 1 + > > > > > 3 files changed, 10 insertions(+), 1 deletion(-) > > > > > > > > > > diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c > > > > > index 0b3820b..a268511 100644 > > > > > --- a/hw/ppc/spapr.c > > > > > +++ b/hw/ppc/spapr.c > > > > > @@ -2541,6 +2541,11 @@ DEFINE_SPAPR_MACHINE(2_8, "2.8", true); > > > > > .driver = TYPE_SPAPR_PCI_HOST_BRIDGE, \ > > > > > .property = "mem64_win_size", \ > > > > > .value = "0", \ > > > > > + }, \ > > > > > + { \ > > > > > + .driver = TYPE_SPAPR_PCI_HOST_BRIDGE, \ > > > > > + .property = "root_bus_type", \ > > > > > + .value = TYPE_PCI_BUS, \ > > > > > }, > > > > > > > > > > static void phb_placement_2_7(sPAPRMachineState *spapr, uint32_t index, > > > > > diff --git a/hw/ppc/spapr_pci.c b/hw/ppc/spapr_pci.c > > > > > index 7cde30e..2fa1f22 100644 > > > > > --- a/hw/ppc/spapr_pci.c > > > > > +++ b/hw/ppc/spapr_pci.c > > > > > @@ -1434,7 +1434,9 @@ static void spapr_phb_realize(DeviceState *dev, Error **errp) > > > > > bus = pci_register_bus(dev, NULL, > > > > > pci_spapr_set_irq, pci_spapr_map_irq, sphb, > > > > > &sphb->memspace, &sphb->iospace, > > > > > - PCI_DEVFN(0, 0), PCI_NUM_PINS, TYPE_PCI_BUS); > > > > > + PCI_DEVFN(0, 0), PCI_NUM_PINS, > > > > > + sphb->root_bus_type ? sphb->root_bus_type : > > > > > + TYPE_PCIE_BUS); > > > > > > > > Shouldn't we ensure that sphb->root_bus_type is either TYPE_PCIE_BUS or > > > > TYPE_PCI_BUS ? > > > > > > Yes, I think so. In fact, I think it would be better to make the > > > property a boolean that just selects PCI-E, rather than this which > > > exposes qemu (semi-)internal type names on the comamnd line. > > > > Sure, a "pcie-root" boolean property should do. > > > > However this is not my main concern, I rather wonder if we have to have > > pci.0 when we pick PCIe for the root. > > Right. > > I've added Andrea Bologna to the CC list to get a libvirt perspective. Thanks for doing so: changes such as this one can have quite an impact on the upper layers of the stack, so the earliest libvirt is involved in the discussion the better. I'm going to go a step further and cross-post to libvir-list in order to give other libvirt contributors a chance to chime in too. > Andrea, > > To summarise the issue here: > * As I've said before the PAPR spec kinda-sorta abstracts the > difference between vanilla PCI and PCI-E > * However, because within qemu we're declaring the bus as PCI that > means some PCI-E devices aren't working right > * In particular it means that PCI-E extended config space isn't > available > > The proposal is to change (on newer machine types) the spapr PHB code > to declare a PCI-E bus instead. AIUI this still won't make the root > complex guest visible (which it's not supposed to be under PAPR), and > the guest shouldn't see a difference in most cases - it will still see > the PAPR abstracted PCIish bus, but will now be able to get extended > config space. > > The possible problem from a libvirt perspective is that doing this in > the simplest way in qemu would change the name of the default bus from > pci.0 to pcie.0. We have two suggested ways to mitigate this: > 1) Automatically create a PCI-E to PCI bridge, so that new machine > types will have both a pcie.0 and pci.0 bus > 2) Force the name of the bus to be pci.0, even though it's treated > as PCI-E in other ways. > > We're trying to work out exactly what will and won't cause trouble for > libvirt. Option 2) is definitely a no-no, as we don't want to be piling up even more hacks and architecture-specific code: the PCI Express Root Bus should be called pcie.0, just as it is on q35 and mach-virt machine types. Option 1) doesn't look too bad, but devices that are added automatically by QEMU are an issue since we need to hardcode knowledge of them into libvirt if we want the rest of the PCI address allocation logic to handle them correctly. Moreover libvirt now has the ability of building a legacy PCI topology without user intervention, if needed to plug in legacy devices, on machines that have a PCI Express Root Bus, which makes the additional bridge fully redundant... ... or at least it would, if we actually had a proper PCIe-to-PCI bridge; AFAIK, though, the closest we have is the i82801b11-bridge that is Intel-specific despite having so far been abused as a generic PCIe-to-PCI bridge. I'm not even sure whether it would work at all on ppc64. Moving from legacy PCI to PCI Express would definitely be an improvement, in my opinion. As mentioned, that's already the case for at least two other architectures, so the more we can standardize on that, the better. That said, considering that a big part of the PCI address allocation logic is based off whether the specific machine type exposes a legay PCI Root Bus or a PCI Express Root Bus, libvirt will need a way to be able to tell which one is which. Version checks are pretty much out of the question, as they fail as soon as downstream releases enter the picture. A few ways we could deal with the situation: 1) switch to PCI Express on newer machine types, and expose some sort of capability through QMP so that libvirt can know about the switch 2) switch between legacy PCI and PCI Express based on a machine type option. libvirt would be able to find out whether the option is available or not, and default to either <controller type='pci' model='pci-root'/> or <controller type='pci' model='pcie-root'/> based on that. In order to support multiple PHBs properly, those would have to be switchable with an option as well 3) create an entirely new machine type, eg. pseries-pcie or whatever someone with the ability to come up with decent names can suggest :) That would make ppc64 similar to x86, where i440fx and q35 have different root buses. libvirt would learn about the new machine type, know that it has a PCI Express Root Bus, and behave accordingly Option 1) would break horribly with existing libvirt versions, and so would Option 2) if we default to using PCI Express. Option 2) with default to legacy PCI and option 3) would work just fine with existing libvirt versions AFAICT, but wouldn't of course expose the new capabilities. Option 3) is probably the one that will be less confusing to users; we might even decide to take the chance and fix other small annoyances with the current pseries machine type, if there's any. On the other hand, it might very well be considered to be too big a hammer for such a small nail. -- Andrea Bolognani / Red Hat / Virtualization -- libvir-list mailing list libvir-list@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/libvir-list