On 03/09/2016 02:09 PM, Laine Stump wrote: > On 03/09/2016 09:54 AM, Daniel P. Berrange wrote: >> On Wed, Mar 09, 2016 at 01:40:36PM +0100, Andrea Bolognani wrote: >>> On Fri, 2016-03-04 at 17:05 -0500, Laine Stump wrote: >>>>> I'm not sure I fully understand all of the above, but I'll pitch >>>>> in with my own proposal regardless :) >>>>> First, we make sure that >>>>> <controller type='pci' index='0' model='pcie-root'/> >>>>> is always added automatically to the domain XML when using the >>>>> mach-virt machine type. Then, if >>>>> <controller type='pci' index='1' model='dmi-to-pci-bridge'/> >>>>> <controller type='pci' index='2' model='pci-bridge'/> >>>>> is present as well we default to virtio-pci, otherwise we use >>>>> the current default of virtio-mmio. This should allow management >>>>> applications, based on knowledge about the guest OS, to easily >>>>> pick between the two address schemes. >>>>> Does this sound like a good idea? >>>> ... or a variation of that, anyway :-) >>>> What I think: If there are *any* pci controllers *beyond* pcie-root, or >>>> if there are any devices that already have a PCI address, then assign >>>> PCI addresses, else use mmio. >>> This sounds reasonable. >>> >>> However, I'm wondering if we shouldn't be even more explicit about >>> this... From libvirt's point of view we just need to agree on some >>> sort of "trigger" that causes it to allocate PCI addresses instead >>> of MMIO addresses, and for that purpose "any PCI controller or any >>> device with a PCI address" is perfectly fine; looking at that from >>> the point of view of an user, though? Not sure about it. >>> >>> What about adding something like >>> >>> <preferences> >>> <preference name="defaultAddressType" value="pci"/> >>> </preferences> >>> >>> to the domain XML? AFAICT libvirt doesn't have any element that >>> could be used for this purpose at the moment, but maybe having a >>> generic way to set domain-wide preferences could be useful in >>> other situations as well? >> [snip] >> >> Looking at this mail and laine's before I really get the impression we >> are over-thinking this all. The automatic address assignment by libvirt >> was written such that it matched what QEMU would have done, so that we >> could introduce the concept of device addressing without breaking >> existing apps which didn't supply addresses. The automatic addressing >> logic certainly wasn't ever intended to be able to implement arbitrary >> policies. >> >> As a general rule libvirt has always aimed to stay out of the policy >> business, and focus on providing the mechanism only. So when we start >> talking about adding <preferences> to the XML this is a sure sign >> that we've gone too far into trying to implement policy in libvirt. >> >> >From the POV of being able to tell libvirt to assign PCI instead of >> MMIO addresses for the virt machine, I think it suffices to have >> libvirt accept a simple <address type="pci"/> element (ie with no >> bus/slot/func filled in). > > That's just a more limited case of the same type of feature creep though - > you're specifying a preference for what type of address you want, just in a > different place. (The odd thing is that we're adding it only to remedy what is > apparently a temporary problem, and even then it's a problem that wouldn't > exist if we just said simply this: "If virtio-mmio is specified, then use > virtio-mmio; If no address is specified, use pci". This puts the burden on > people insisting on the old / soon-to-be-deprecated (?) virtio-mmio address > type to add a little something to the XML (and a very simple little something > at that!). For a short time, this could cause problems for people who create > new domains using qemu binaries that don't have a pci bus on aarch64/virt yet, > but that will be easily solvable (by adding "<address type='virtio-mmio'/>", > which is nearly as simple as typing "<address type='pci'/>"). > > > A downside of setting your preference with <address type="pci"/> vs. having > the preference in a separate element is that you can no longer cause a > "re-addressing" of all the devices by simply removing all the <address> > elements (and it's really that that got me thinking about adding > hints/preferences in the <target> element). I don't know how important that > is; I suppose when the producer of the XML is some other software it's a > non-issue, when the producer is a human using virsh edit it would make life > much simpler once in awhile (and the suggestion in the previous paragraph > would eliminate that problem anyway). > > So is there a specific reason why we need to keep virtio-mmio as the default, > and require explicitly saying "address type='pci'" in order to get a PCI address? > I explained this in my reply to Dan just now, but basically all currently released distros don't work with virtio-pci. However long term I've suggested from the start that we switch to virtio-pci by default, keying off a new enough -M virt version, as an arbitrary marker in time when hopefully most common distros work with virtio-pci >> If applictions care about assigning devices to specify PCI buses, >> ie to distinguish between PCI & PCIe, or to pick a different bus >> when there is one bus per NUMA node, then really it is time for >> the application to start specifying the full <address> element >> with all details, and not try to invent ever more complex policies >> inside libvirt which will never deal with all application use cases. > > First, I agree with your vigilance against unnecessary feature creep. Both > because keeping things simple makes it easier to maintain and use, and also > because it's very difficult (or even impossible) to change something once it's > gone in, in the case that you have second thoughts. > > But, expanding beyond just the aarch64/virt mmio vs. pci problem (which I had > already done in this thread, and which is what got us into this "feature > creep" discussion in the first place :-)... > > I remember when we were first adding support for Q35 that we said it should be > as easy to create a domain with Q35 machinetype as it is for 440fx, i.e. you > should be able to do it without manually specifying any PCI addresses (that's > why we have the strange default bus hierarchy, with a dmi-to-pci-bridge and a > pci-bridge, and all devices going onto the pci-bridge). But of course 440fx is > very simplistic - all slots are standard PCI and all are hotpluggable. On Q35 > there could be room for choices though, in particular PCI vs PCIe and > hotpluggable vs. not. (and yeah, also what NUMA node the bus is on; I agree > that one is "getting out there"). So since there are choices, libvirt has by > definition begun implementing policy when it auto-assigns *any* PCI address > (current policy is "require all auto-assigned device addresses to be > hotpluggable standard PCI). And if libvirt doesn't provide an easy way to > choose, it will have implemented a defacto mandatory policy (unless you force > full specification of PCI addresses, which goes against the "make it easy" goal). > > And the current policy is looking (for Q35 and aarch64/virt at least) less and > less like the correct one - in the years since it was put in, 1) we've learned > that qemu doesn't care, and will not ever care, that a PCI device has been > plugged into a PCIe slot, 2) we now have support for several PCIe controllers > we didn't previously have, 3) people have started complaining that their > devices are in PCI slots rather than PCIe, 4) even though it was stated at the > time I wrote the code to auto-add a pci-bridge to every Q35 domain that > pci-bridge's failure to support hotplug was a temporary bug, I just learned > yesterday that it apparently still doesn't work, and 5) some platforms (e.g. > our favorite aarch64/virt) are emulating a hardware platform that has *never* > had standard PCI slots, only PCIe. > > Beyond that, there is no place that provides a simple encoding of which type > of controller provides which type of slot, and what is allowed to plug into > what. If you require the management application/human to manually specify all > the PCI addresses as soon as they have a need for one of these basic > characteristics, then not only has it become cumbersome to define a domain > (because the management app has to maintain a data structure to keep track of > which PCI addresses are in use and which aren't), but it means that the > management application also needs to know all sorts of rules about which PCI > controllers are actually pcie vs. pci, and which accept hotplug devices vs. > which don't, as well as things like the min/max slot number for each > controller, and which ones can plug into where, e.g. a pcie-root-port can only > plug into pcie-root or pcie-expander-bus, and a pcie-switch-downstream-port > can only plug into a pcie-switch-upstream-port, etc. Requiring a management > app to get all of that right just so that they can pick between a hotpluggable > and non-hotpluggable slot seems like an overly large burden (and prone to error). > > In the end, if libvirt makes it simple for the management app to specify what > kind of slot it wants, rather than requiring it to specify the exact slot, > then libvirt isn't making any policy decisions, it's just making it easier for > the management app to implement its own policy decisions, without requiring > the management app to know all the details about which controller implements > what kind of connection etc, and that does seem to fit within libvirt's purpose. I guess the main thing would be to understand usecases... outside of aarch64 is there a real reason for q35 that apps might need one address policy over another? If it's something super obscure, like only for testing or dev, then I think asking people to do it manually is fine. At least for the arm case I think we can side step the question by adding the <address type='pci'/> allocation request, and flipping the default from virtio-mmio to virtio-pci at some future point - Cole -- libvir-list mailing list libvir-list@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/libvir-list