On Wed, 2013-02-06 at 14:13 -0500, Laine Stump wrote: > Now that qemu is getting the q35 machine type, libvirt needs to > support it. > > As far as I understand, from libvirt's point of view, q35 is just > another x86_64 system, but with a different set of implicit devices, > and possibly some extra rules limiting which devices can be plugged > into which bus/slot on the guest. That means that in order to support > it, we will need to recognize when a q35-based machine type is being > created, and auto-add all the implicit devices to libvirt's config > model for that domain, then pay attention to the extra rules when > assigning addresses for all the user-added devices. > > We already add implicit controllers/devices for pc-based machine > types; as a matter of fact, currently, libvirt improperly assumes (for > the purposes of adding implicit devices) that *every* virtual machine > is based on the "pc" machine type (or rather it just doesn't pay > attention), so it always adds all the implicit devices for a pc > machine type for every domain. This of course is already incorrect for > many (probably all?) non-x86 machine types, even before we add q35 > into the mix. To fix this, it might be reasonable (and arguably, it's > necessary to fix the problem in a backward-compatible manner) to just > setup a table of machinetype ==> implicit device lists, look up the > machine type in this table, and add the devices needed for that > machine type. This goes against libvirt's longstanding view of > machinetype as being an opaque value that it merely passes through to > qemu, but it's manageable for the existing machine types (even > including q35), since it's a finite set. But it starts to be a pain to > maintain when you think about future additions - yet another case > where new functionality in qemu will require an update to libvirt > before it can be fully used via libvirt. > > In the long term, it would be very useful / more easily maintainable > to have a qemu status command available via QMP which would return the > list of implicit devices (and their PCI addresses) for any requested > machine type. It would be necessary that this command be callable > multiple times within a single execution of qemu, giving it a > different machinetype each time. This way libvirt could first query > the list of available machinetypes in this particular qemu binary, > then request the list of implicit devices for each machine type > (libvirt runs each available qemu binary *once* the first time it's > requested, and caches all such capabilities information so that it > doesn't need to re-run qemu again and again). My limited understanding > of qemu's code is that qemu itself doesn't have a table of this > information as data, but instead has lines of code that are executed > to create it, thus making it impractical to provide the list of > devices for a machinetype without actually instantiating a machine of > that type. What's the feasibility of adding such a capability (and in > the process likely making the list of implicit devices in qemu itself > table/data driven rather than constructed with lines of code). > > More questions: > > 1) It seems that the exact list of devices for the basic q35 machine > type hasn't been settled on yet, is that correct? I think what we have currently is just a stepping stone to a base configuration. At a minimum, we're missing the PCI bridge attached to the ICH, which is where I think libvirt should attach non-chipset component devices. Next would be PCIe root ports where emulated and assigned PCIe devices could be attached. > 2) Are there other issues aside from implicit controller devices I > need to consider for q35? For example, are there any devices that (as > I recall is the case for some devices on "pc") may or may not be > present, but if they are present they are always at a particular PCI > address (meaning that address must be reserved)? I've also just > learned that certain types of PCIe devices must be plugged into > certain locations on the guest bus? ("root complex" devices - is there > a good source of background info to learn the meaning of terms like > that, and the rules of engagement? libvirt will need to know/follow > these rules.) The GMCH (Graphics & Memory Controller Hub) defines: 00.0 - Host bridge 01.0 - x16 root port for external graphics 02.0,1 - integrated graphics device (IGD) 03.0,1,2,3 - management engine subsystem And the ICH defines: 19.0 - Embedded ethernet (e1000e) 1a.* - UHCI/EHCI 1b.0 - HDA audio 1c.* - PCIe root ports 1d.* - UHCI/EHCI 1e.0 - PCI Bridge 1f.0 - ISA Bridge 1f.2,5 - SATA 1f.3 - SMBUS Personally, I think these slots should be reserved for only the spec defined devices, and I'm not all that keen on using the remaining slots for anything else. Users should of course be allowed to put anything anywhere, but libvirt auto-placement should follow some rules. All of the above sit on what we now call bus pcie.0. This is a root complex, which implies that all of endpoints are root complex integrated endpoints. Being an integrated endpoint restricts aspects of the device. I've already found out the hard way that Windows actually cares about this and will ignore PCI assigned devices of type "Endpoint" when attached to the root complex bus. (endpoint, root complex, etc is defined in the PCIe spec, the above slot use is defined in the respective chipset spec) What I'd like to see is to implement the PCI-bridge at 1e.0 to expose a complete, virgin PCI bus. libvirt should use that as the default location for any PCI device that's not a chipset component. We might be able to get away with installing our e1000 at 19.0, but otherwise I'm thinking that the list only includes uhci/ehci, hda, ahci, and the chipset components themselves (smbus, isa, root ports, etc...). We don't have "IGD", so our graphics should go on the PCI bus and the PCI bridge should include functioning VGA enable bits. Maybe QXL wants to make itself a PCIe device, in which case it should be attached behind a PCIe root port at slot 01.0. Secondary PCIe graphics attach to root ports behind 1c.*. This is the same framework within real hardware has to work. Assigned devices get interesting due to the PCIe type. We've never had any problems attaching PCIe devices to PCI buses on PIIX (but it may be holding back our ability to support graphics passthrough), so assigned devices can probably be attached to the PCI bus. More appropriate would be to attach "Endpoints" behind root ports and "Integrated Endpoints" to the root complex. I've got some code that will mangle the PCIe type to it's location in the topology, but it needs more work. That should help make things more flexible. > 3) What new types of devices/controllers must be supported for a > properly functioning q35 machine? AHCI, bridges, root ports (we can skip these w/o PCIe devices, but for hotplug we might want them fully populated - otherwise everything gets hotplugged to the PCI bus). Thanks, Alex -- libvir-list mailing list libvir-list@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/libvir-list