Re: [PATCH 2/2] HACK: qemu: aarch64: Use virtio-pci if user specifies PCI controller

Laine Stump <laine@xxxxxxxxx> · Wed, 9 Mar 2016 14:09:22 -0500

On 03/09/2016 09:54 AM, Daniel P. Berrange wrote:
On Wed, Mar 09, 2016 at 01:40:36PM +0100, Andrea Bolognani wrote:
On Fri, 2016-03-04 at 17:05 -0500, Laine Stump wrote:
I'm not sure I fully understand all of the above, but I'll pitch
in with my own proposal regardless :)

First, we make sure that

     <controller type='pci' index='0' model='pcie-root'/>

is always added automatically to the domain XML when using the
mach-virt machine type. Then, if

     <controller type='pci' index='1' model='dmi-to-pci-bridge'/>
     <controller type='pci' index='2' model='pci-bridge'/>

is present as well we default to virtio-pci, otherwise we use
the current default of virtio-mmio. This should allow management
applications, based on knowledge about the guest OS, to easily
pick between the two address schemes.

Does this sound like a good idea?
... or a variation of that, anyway :-)

What I think: If there are *any* pci controllers *beyond* pcie-root, or
if there are any devices that already have a PCI address, then assign
PCI addresses, else use mmio.
This sounds reasonable.

However, I'm wondering if we shouldn't be even more explicit about
this... From libvirt's point of view we just need to agree on some
sort of "trigger" that causes it to allocate PCI addresses instead
of MMIO addresses, and for that purpose "any PCI controller or any
device with a PCI address" is perfectly fine; looking at that from
the point of view of an user, though? Not sure about it.

What about adding something like

   <preferences>
     <preference name="defaultAddressType" value="pci"/>
   </preferences>

to the domain XML? AFAICT libvirt doesn't have any element that
could be used for this purpose at the moment, but maybe having a
generic way to set domain-wide preferences could be useful in
other situations as well?
[snip]

Looking at this mail and laine's before I really get the impression we
are over-thinking this all. The automatic address assignment by libvirt
was written such that it matched what QEMU would have done, so that we
could introduce the concept of device addressing without breaking
existing apps which didn't supply addresses. The automatic addressing
logic certainly wasn't ever intended to be able to implement arbitrary
policies.

As a general rule libvirt has always aimed to stay out of the policy
business, and focus on providing the mechanism only. So when we start
talking about adding  <preferences> to the XML this is a sure sign
that we've gone too far into trying to implement policy in libvirt.

>From the POV of being able to tell libvirt to assign PCI instead of
MMIO addresses for the virt machine, I think it suffices to have
libvirt accept a simple  <address type="pci"/> element (ie with no
bus/slot/func filled in).

That's just a more limited case of the same type of feature creep though 
- you're specifying a preference for what type of address you want, just 
in a different place. (The odd thing is that we're adding it only to 
remedy what is apparently a temporary problem, and even then it's a 
problem that wouldn't exist if we just said simply this:  "If 
virtio-mmio is specified, then use virtio-mmio; If no address is 
specified, use pci". This puts the burden on people insisting on the old 
/ soon-to-be-deprecated (?) virtio-mmio address type to add a little 
something to the XML (and a very simple little something at that!). For 
a short time, this could cause problems for people who create new 
domains using qemu binaries that don't have a pci bus on aarch64/virt 
yet, but that will be easily solvable (by adding "<address 
type='virtio-mmio'/>", which is nearly as simple as typing "<address 
type='pci'/>").

A downside of setting your preference with <address type="pci"/> vs. 
having the preference in a separate element is that you can no longer 
cause a "re-addressing" of all the devices by simply removing all the 
<address> elements (and it's really that that got me thinking about 
adding hints/preferences in the <target> element). I don't know how 
important that is; I suppose when the producer of the XML is some other 
software it's a non-issue, when the producer is a human using virsh edit 
it would make life much simpler once in awhile (and the suggestion in 
the previous paragraph would eliminate that problem anyway).

So is there a specific reason why we need to keep virtio-mmio as the 
default, and require explicitly saying "address type='pci'" in order to 
get a PCI address?

If applictions care about assigning devices to specify PCI buses,
ie to distinguish between PCI & PCIe, or to pick a different bus
when there is one bus per NUMA node, then really it is time for
the application to start specifying the full <address> element
with all details, and not try to invent ever more complex policies
inside libvirt which will never deal with all application use cases.

First, I agree with your vigilance against unnecessary feature creep. 
Both because keeping things simple makes it easier to maintain and use, 
and also because it's very difficult (or even impossible) to change 
something once it's gone in, in the case that you have second thoughts.

But, expanding beyond just the aarch64/virt mmio vs. pci problem (which 
I had already done in this thread, and which is what got us into this 
"feature creep" discussion in the first place :-)...

I remember when we were first adding support for Q35 that we said it 
should be as easy to create a domain with Q35 machinetype as it is for 
440fx, i.e. you should be able to do it without manually specifying any 
PCI addresses (that's why we have the strange default bus hierarchy, 
with a dmi-to-pci-bridge and a pci-bridge, and all devices going onto 
the pci-bridge). But of course 440fx is very simplistic - all slots are 
standard PCI and all are hotpluggable. On Q35 there could be room for 
choices though, in particular PCI vs PCIe and hotpluggable vs. not. (and 
yeah, also what NUMA node the bus is on; I agree that one is "getting 
out there"). So since there are choices, libvirt has by definition begun 
implementing policy when it auto-assigns *any* PCI address (current 
policy is "require all auto-assigned device addresses to be hotpluggable 
standard PCI). And if libvirt doesn't provide an easy way to choose, it 
will have implemented a defacto mandatory policy (unless you force full 
specification of PCI addresses, which goes against the "make it easy" goal).

And the current policy is looking (for Q35 and aarch64/virt at least) 
less and less like the correct one - in the years since it was put in, 
1) we've learned that qemu doesn't care, and will not ever care, that a 
PCI device has been plugged into a PCIe slot, 2) we now have support for 
several PCIe controllers we didn't previously have, 3) people have 
started complaining that their devices are in PCI slots rather than 
PCIe, 4) even though it was stated at the time I wrote the code to 
auto-add a pci-bridge to every Q35 domain that pci-bridge's failure to 
support hotplug was a temporary bug, I just learned yesterday that it 
apparently still doesn't work, and 5) some platforms (e.g. our favorite 
aarch64/virt) are emulating a hardware platform that has *never* had 
standard PCI slots, only PCIe.

Beyond that, there is no place that provides a simple encoding of which 
type of controller provides which type of slot, and what is allowed to 
plug into what. If you require the management application/human to 
manually specify all the PCI addresses as soon as they have a need for 
one of these basic characteristics, then not only has it become 
cumbersome to define a domain (because the management app has to 
maintain a data structure to keep track of which PCI addresses are in 
use and which aren't), but it means that the management application also 
needs to know all sorts of rules about which PCI controllers are 
actually pcie vs. pci, and which accept hotplug devices vs. which don't, 
as well as things like the min/max slot number for each controller, and 
which ones can plug into where, e.g. a pcie-root-port can only plug into 
pcie-root or pcie-expander-bus, and a pcie-switch-downstream-port can 
only plug into a pcie-switch-upstream-port, etc. Requiring a management 
app to get all of that right just so that they can pick between a 
hotpluggable and non-hotpluggable slot seems like an overly large burden 
(and prone to error).

In the end, if libvirt makes it simple for the management app to specify 
what kind of slot it wants, rather than requiring it to specify the 
exact slot, then libvirt isn't making any policy decisions, it's just 
making it easier for the management app to implement its own policy 
decisions, without requiring the management app to know all the details 
about which controller implements what kind of connection etc, and that 
does seem to fit within libvirt's purpose.

--
libvir-list mailing list
libvir-list@xxxxxxxxxx
https://www.redhat.com/mailman/listinfo/libvir-list