On 16/11/16 01:02, Andrea Bolognani wrote: > On Tue, 2016-11-01 at 13:46 +1100, David Gibson wrote: >> On Mon, Oct 31, 2016 at 03:10:23PM +1100, Alexey Kardashevskiy wrote: >>> >>> On 31/10/16 13:53, David Gibson wrote: >>>> >>>> On Fri, Oct 28, 2016 at 12:07:12PM +0200, Greg Kurz wrote: >>>>> >>>>> On Fri, 28 Oct 2016 18:56:40 +1100 >>>>> Alexey Kardashevskiy <aik@xxxxxxxxx> wrote: >>>>> >>>>>> >>>>>> At the moment sPAPR PHB creates a root buf of TYPE_PCI_BUS type. >>>>>> This means that vfio-pci devices attached to it (and this is >>>>>> a default behaviour) hide PCIe extended capabilities as >>>>>> the bus does not pass a pci_bus_is_express(pdev->bus) check. >>>>>> >>>>>> This changes adds a default PCI bus type property to sPAPR PHB >>>>>> and uses TYPE_PCIE_BUS if none passed; older machines get TYPE_PCI_BUS >>>>>> for backward compatibility as a bus type is used in the bus name >>>>>> so the root bus name becomes "pcie.0" instead of "pci.0". >>>>>> >>>>>> Signed-off-by: Alexey Kardashevskiy <aik@xxxxxxxxx> >>>>>> --- >>>>>> >>>>>> What can possibly go wrong with such change of a name? >>>>>> From devices prospective, I cannot see any. >>>>>> >>>>>> libvirt might get upset as "pci.0" will not be available, >>>>>> will it make sense to create pcie.0 as a root bus and always >>>>>> add a PCIe->PCI bridge and name its bus "pci.0"? >>>>>> >>>>>> Or create root bus from TYPE_PCIE_BUS and force name to "pci.0"? >>>>>> pci_register_bus() can do this. >>>>>> >>>>>> >>>>>> --- >>>>>> hw/ppc/spapr.c | 5 +++++ >>>>>> hw/ppc/spapr_pci.c | 5 ++++- >>>>>> include/hw/pci-host/spapr.h | 1 + >>>>>> 3 files changed, 10 insertions(+), 1 deletion(-) >>>>>> >>>>>> diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c >>>>>> index 0b3820b..a268511 100644 >>>>>> --- a/hw/ppc/spapr.c >>>>>> +++ b/hw/ppc/spapr.c >>>>>> @@ -2541,6 +2541,11 @@ DEFINE_SPAPR_MACHINE(2_8, "2.8", true); >>>>>> .driver = TYPE_SPAPR_PCI_HOST_BRIDGE, \ >>>>>> .property = "mem64_win_size", \ >>>>>> .value = "0", \ >>>>>> + }, \ >>>>>> + { \ >>>>>> + .driver = TYPE_SPAPR_PCI_HOST_BRIDGE, \ >>>>>> + .property = "root_bus_type", \ >>>>>> + .value = TYPE_PCI_BUS, \ >>>>>> }, >>>>>> >>>>>> static void phb_placement_2_7(sPAPRMachineState *spapr, uint32_t index, >>>>>> diff --git a/hw/ppc/spapr_pci.c b/hw/ppc/spapr_pci.c >>>>>> index 7cde30e..2fa1f22 100644 >>>>>> --- a/hw/ppc/spapr_pci.c >>>>>> +++ b/hw/ppc/spapr_pci.c >>>>>> @@ -1434,7 +1434,9 @@ static void spapr_phb_realize(DeviceState *dev, Error **errp) >>>>>> bus = pci_register_bus(dev, NULL, >>>>>> pci_spapr_set_irq, pci_spapr_map_irq, sphb, >>>>>> &sphb->memspace, &sphb->iospace, >>>>>> - PCI_DEVFN(0, 0), PCI_NUM_PINS, TYPE_PCI_BUS); >>>>>> + PCI_DEVFN(0, 0), PCI_NUM_PINS, >>>>>> + sphb->root_bus_type ? sphb->root_bus_type : >>>>>> + TYPE_PCIE_BUS); >>>>> >>>>> Shouldn't we ensure that sphb->root_bus_type is either TYPE_PCIE_BUS or >>>>> TYPE_PCI_BUS ? >>>> >>>> Yes, I think so. In fact, I think it would be better to make the >>>> property a boolean that just selects PCI-E, rather than this which >>>> exposes qemu (semi-)internal type names on the comamnd line. >>> >>> Sure, a "pcie-root" boolean property should do. >>> >>> However this is not my main concern, I rather wonder if we have to have >>> pci.0 when we pick PCIe for the root. >> >> Right. >> >> I've added Andrea Bologna to the CC list to get a libvirt perspective. > > Thanks for doing so: changes such as this one can have quite > an impact on the upper layers of the stack, so the earliest > libvirt is involved in the discussion the better. > > I'm going to go a step further and cross-post to libvir-list > in order to give other libvirt contributors a chance to chime > in too. > >> Andrea, >> >> To summarise the issue here: >> * As I've said before the PAPR spec kinda-sorta abstracts the >> difference between vanilla PCI and PCI-E >> * However, because within qemu we're declaring the bus as PCI that >> means some PCI-E devices aren't working right >> * In particular it means that PCI-E extended config space isn't >> available >> >> The proposal is to change (on newer machine types) the spapr PHB code >> to declare a PCI-E bus instead. AIUI this still won't make the root >> complex guest visible (which it's not supposed to be under PAPR), and >> the guest shouldn't see a difference in most cases - it will still see >> the PAPR abstracted PCIish bus, but will now be able to get extended >> config space. >> >> The possible problem from a libvirt perspective is that doing this in >> the simplest way in qemu would change the name of the default bus from >> pci.0 to pcie.0. We have two suggested ways to mitigate this: >> 1) Automatically create a PCI-E to PCI bridge, so that new machine >> types will have both a pcie.0 and pci.0 bus >> 2) Force the name of the bus to be pci.0, even though it's treated >> as PCI-E in other ways. >> >> We're trying to work out exactly what will and won't cause trouble for >> libvirt. > > Option 2) is definitely a no-no, as we don't want to be piling > up even more hacks and architecture-specific code: the PCI > Express Root Bus should be called pcie.0, just as it is on q35 > and mach-virt machine types. > > Option 1) doesn't look too bad, but devices that are added > automatically by QEMU are an issue since we need to hardcode > knowledge of them into libvirt if we want the rest of the PCI > address allocation logic to handle them correctly. > > Moreover libvirt now has the ability of building a legacy PCI > topology without user intervention, if needed to plug in > legacy devices, on machines that have a PCI Express Root Bus, > which makes the additional bridge fully redundant... > > ... or at least it would, if we actually had a proper > PCIe-to-PCI bridge; AFAIK, though, the closest we have is the > i82801b11-bridge that is Intel-specific despite having so far > been abused as a generic PCIe-to-PCI bridge. I'm not even > sure whether it would work at all on ppc64. > > Moving from legacy PCI to PCI Express would definitely be an > improvement, in my opinion. As mentioned, that's already the > case for at least two other architectures, so the more we can > standardize on that, the better. > > That said, considering that a big part of the PCI address > allocation logic is based off whether the specific machine > type exposes a legay PCI Root Bus or a PCI Express Root Bus, > libvirt will need a way to be able to tell which one is which. > > Version checks are pretty much out of the question, as they > fail as soon as downstream releases enter the picture. A > few ways we could deal with the situation: > > 1) switch to PCI Express on newer machine types, and > expose some sort of capability through QMP so that > libvirt can know about the switch > > 2) switch between legacy PCI and PCI Express based on a > machine type option. libvirt would be able to find out > whether the option is available or not, and default to > either > > <controller type='pci' model='pci-root'/> > > or > > <controller type='pci' model='pcie-root'/> > > based on that. In order to support multiple PHBs > properly, those would have to be switchable with an > option as well > > 3) create an entirely new machine type, eg. pseries-pcie > or whatever someone with the ability to come up with > decent names can suggest :) That would make ppc64 > similar to x86, where i440fx and q35 have different > root buses. libvirt would learn about the new machine > type, know that it has a PCI Express Root Bus, and > behave accordingly > > Option 1) would break horribly with existing libvirt > versions, and so would Option 2) if we default to using How exactly 1) will break libvirt? Migrating from pseries-2.7 to pseries-2.8 does not work anyway, and machines are allowed to behave different from version to version, what distinct difference will using "pseries-pcie-X.Y" make? I believe after we introduced the very first pseries-pcie-X.Y, we will just stop adding new pseries-X.Y. > PCI Express. Option 2) with default to legacy PCI and > option 3) would work just fine with existing libvirt > versions AFAICT, but wouldn't of course expose the new > capabilities. > > Option 3) is probably the one that will be less confusing > to users; we might even decide to take the chance and fix > other small annoyances with the current pseries machine > type, if there's any. On the other hand, it might very well > be considered to be too big a hammer for such a small nail. -- Alexey -- libvir-list mailing list libvir-list@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/libvir-list