On Tue, Apr 16, 2013 at 12:05:40PM -0400, Laine Stump wrote: > On 04/15/2013 05:58 PM, Michael S. Tsirkin wrote: > > On Mon, Apr 15, 2013 at 11:27:03AM -0600, Alex Williamson wrote: > >> On Fri, 2013-04-12 at 11:46 -0400, Laine Stump wrote: > >>> On 04/11/2013 07:23 AM, Michael S. Tsirkin wrote: > >>>> On Thu, Apr 11, 2013 at 07:03:56AM -0400, Laine Stump wrote: > >>>>> On 04/10/2013 05:26 AM, Daniel P. Berrange wrote: > >>>>>> On Tue, Apr 09, 2013 at 04:06:06PM -0400, Laine Stump wrote: > >>>>>>> On 04/09/2013 04:58 AM, Daniel P. Berrange wrote: > >>>>>>>> On Mon, Apr 08, 2013 at 03:32:07PM -0400, Laine Stump wrote: > >>>>>>>> Actually I do wonder if we should reprent a PCI root as two > >>>>>>>> <controller> elements, one representing the actual PCI root > >>>>>>>> device, and the other representing the host bridge that is > >>>>>>>> built-in. > >>>>>>>> > >>>>>>>> Also we should use the actual model names, not 'pci-root' or > >>>>>>>> 'pcie-root' but rather i440FX for "pc" machine type, and whatever > >>>>>>>> the q35 model name is. > >>>>>>>> > >>>>>>>> - One PCI root with built-in PCI bus (ie todays' setup) > >>>>>>>> > >>>>>>>> <controller type="pci-root" index="0"> > >>>>>>>> <model name="i440FX"/> > >>>>>>>> </controller> > >>>>>>>> <controller type="pci" index="0"> <!-- Host bridge --> > >>>>>>>> <address type='pci' domain='0' bus='0' slot='0''/> > >>>>>>> Isn't this saying that the bridge connects to itself? (since bus 0 is > >>>>>>> this bus) > >>>>>>> > >>>>>>> I understand (again, possibly wrongly) that the builtin PCI bus connects > >>>>>>> to the chipset using its own slot 0 (that's why it's reserved), but > >>>>>>> that's its address on itself. How is this bridge associated with the > >>>>>>> pci-root? > >>>>>>> > >>>>>>> Ah, I *think* I see it - the domain attribute of the pci controller is > >>>>>>> matched to the index of the pci-root controller, correct? But there's > >>>>>>> still something strange about the <address> of the pci controller being > >>>>>>> self-referential. > >>>>>> Yes, the index of the pci-root matches the 'domain' of <address> > >>>>> Okay, then the way that libvirt differentiates between a pci bridge that > >>>>> is connected to the root, and one that is connected to a slot of another > >>>>> bridge is 1) the "bus" attribute of the bridge's <address> matches the > >>>>> "index" attribute of the bridge itself, and 2) "slot" is always 0. Correct? > >>>>> > >>>>> (The corollary of this is that if slot == 0 and bus != index, or bus == > >>>>> index and slot != 0, it is a configuration error). > >>>>> > >>>>> I'm still unclear on the usefulness of the pci-root controller though - > >>>>> all the necessary information is contained in the pci controller, except > >>>>> for the type of root. But in the case of pcie root, I think you're not > >>>>> allowed to connect a standard bridge to it, only a "dmi-to-pci-bridge" > >>>>> (i82801b11-bridge) > >>>> Yes you can connect a pci bridge to pcie-root. > >>>> It's represented as a root complex integrated device. > >> Is this accurate? Per the PCI express spec, any PCI express device > >> needs to have a PCI express capability, which our pci-bridge does not. > >> I think this is one of the main differences for our i82801b11-bridge, > >> that it exposes itself as a root complex integrated endpoint, so we know > >> it's effectively a PCIe-to-PCI bridge. > > If it does not have an express link upstream it's not a > > PCIe-to-PCI bridge, is it? > > > To my untrained ear it sounds like you're disagreeing with yourself ??? > > > >> We'll be asking for trouble > >> if/when we get guest IOMMU support if we are lax about using PCI-to-PCI > >> bridges where we should have PCIe-to-PCI bridges. > > I recall the spec saying somewhere that integrated endpoints are outside > > the root complex hierarchy. I think IOMMU will simply not apply to > > these. > > > Correct me if I'm wrong - I think libvirt can ignore this bit of debate > other than to use its result to determine which devices are allowed to > connect to which other devices, right? Yes. > >> There are plenty of > >> examples to the contrary of root complex integrated endpoints without an > >> express capability, but that doesn't make it correct to the spec. > > Is there something in the spec explicitly forbidding this? I merely > > find: The PCI Express Capability structure is required for PCI Express > > device Functions. > > So if it's not an express device it does not have to have > > an express capability? > > > > Maybe we should send an example dump to pci sig and ask them... > > > >>> ARGHH!! Just when I think I'm starting to understand *something* about > >>> these devices... > >>> > >>> (later edit: after some coaching on IRC, I *think* I've got a bit better > >>> handle on it.) > > > (But I guess not good enough :-P) > > > >>> > >>>>>>>> </controller> > >>>>>>>> <interface type='direct'> > >>>>>>>> ... > >>>>>>>> <address type='pci' domain='0' bus='0' slot='3'/> > >>>>>>>> </controller> > >>>>>>>> > >>>>>>>> - One PCI root with built-in PCI bus and extra PCI bridge > >>>>>>>> > >>>>>>>> <controller type="pci-root" index="0"> > >>>>>>>> <model name="i440FX"/> > >>>>>>>> </controller> > >>>>>>>> <controller type="pci" index="0"> <!-- Host bridge --> > >>>>>>>> <address type='pci' domain='0' bus='0' slot='0'/> > >>>>>>>> </controller> > >>>>>>>> <controller type="pci" index="1"> <!-- Additional bridge --> > >>>>>>>> <address type='pci' domain='0' bus='0' slot='1'/> > >>>>>>>> </controller> > >>>>>>>> <interface type='direct'> > >>>>>>>> ... > >>>>>>>> <address type='pci' domain='0' bus='1' slot='3'/> > >>>>>>>> </controller> > >>>>>>>> > >>>>>>>> - One PCI root with built-in PCI bus, PCI-E bus and and extra PCI bridge > >>>>>>>> (ie possible q35 setup) > >>>>>>> Why would a q35 machine have an i440FX pci-root? > >>>>>> It shouldn't, that's a typo > >>>>>> > >>>>>>>> <controller type="pci-root" index="0"> > >>>>>>>> <model name="i440FX"/> > >>>>>>>> </controller> > >>>>>>>> <controller type="pci" index="0"> <!-- Host bridge --> > >>>>>>>> <address type='pci' domain='0' bus='0' slot='0'/> > >>>>>>>> </controller> > >>>>>>>> <controller type="pci" index="1"> <!-- Additional bridge --> > >>>>>>>> <address type='pci' domain='0' bus='0' slot='1'/> > >>>>>>>> </controller> > >>>>>>>> <controller type="pci" index="1"> <!-- Additional bridge --> > >>>>>>>> <address type='pci' domain='0' bus='0' slot='1'/> > >>>>>>>> </controller> > >>>>>>> I think you did a cut-paste here and intended to change something, but > >>>>>>> didn't - those two bridges are identical. > >>>>>> Yep, the slot should be 2 in the second one > >>>>>> > >>>>>>>> <interface type='direct'> > >>>>>>>> ... > >>>>>>>> <address type='pci' domain='0' bus='1' slot='3'/> > >>>>>>>> </controller> > >>>>>>>> > >>>>>>>> So if we later allowed for mutiple PCI roots, then we'd have something > >>>>>>>> like > >>>>>>>> > >>>>>>>> <controller type="pci-root" index="0"> > >>>>>>>> <model name="i440FX"/> > >>>>>>>> </controller> > >>>>>>>> <controller type="pci-root" index="1"> > >>>>>>>> <model name="i440FX"/> > >>>>>>>> </controller> > >>>>>>>> <controller type="pci" index="0"> <!-- Host bridge 1 --> > >>>>>>>> <address type='pci' domain='0' bus='0' slot='0''/> > >>>>>>>> </controller> > >>>>>>>> <controller type="pci" index="0"> <!-- Host bridge 2 --> > >>>>>>>> <address type='pci' domain='1' bus='0' slot='0''/> > >>>>>>>> </controller> > >>> > >>> There is a problem here - within a given controller type, we will now > >>> have the possibility of multiple controllers with the same index - the > >>> differentiating attribute will be in the <address> subelement, which > >>> could create some awkwardness. Maybe instead this should be handled with > >>> a different model of pci controller, and we can add a "domain" attribute > >>> at the toplevel rather than specifying an <address>? > >> On real hardware, the platform can specify the _BBN (Base Bus Number = > >> bus) and the _SEG (Segment = domain) of the host bridge. So perhaps you > >> want something like: > >> > >> <controller type="pci-host-bridge"> > >> <model name="i440FX"/> > >> <address type="pci-host-bridge-addr" domain='1' bus='0'/> > >> </controller> > > > The <address> element is intended to specify where a device or > controller is connected *to*, not what bus/domain it *provides*. I think > you're intending for this to provide domain 1 bus 0, so according to > existing convention, you would want that information in the <controller> > element attributes (e.g. for all other controller types, the generic > "index" attribute is used to indicate a bus number when such a thing is > appropriate for that type of controller). > > Anyway, I've simplified this a bit in my latest iteration - there are no > separate "root" and "root bus" controllers, just a "pci-root" (for > i440FX) or "pcie-root" (for q35), both of which provide a "pci" bus (I'm > using the term loosely here), each with different restrictions about > what can be connected. > > > > Yes, we could specify segments, though it's not the same as > > a domain as linux guests define it (I assume this is what libvirt wants > > to call a domain): if memory serves a segment does not have to be a root > > based hierarchy, linux domains are all root based. > > > I'm not exactly sure of the meanings/implications of all those terms, > but from the point of view of libvirt, as long as we can represent all > possible connections between devices using the domain:bus:slot.function > notation, I think it doesn't matter too much. > > > > We are better off not specifying BBN for all buses I think - > > > How would you differentiate between the different buses without some > sort of identifier? > > > > it's intended for multi-root support for legacy OSes. > > > >> "index" is confusing to me. > > > index is being used just because that's been the convention for other > controller types - when there are multiple controllers of the same type, > each is given an index, and that's used in the "child" devices to > indicate which of the parent controllers they connect to. > > > > I'd prefer ID for bus not a number, I'm concerned users will > > assume it's bus number and get confused by a mismatch. > > So you would rather that they were something like this? > > <controller type='pci' bus='pci.0'> > <model type='pci-root'/> > </controller> > <interface type='blah'> > ... > <address type='pci' domain='0' bus='pci.0' slot='0' function='0'/> > </interface> > > The problem is that the use of numeric bus IDs is fairly deeply > ingrained in libvirt; every existing libvirt guest config has device > addresses specifying "bus='0'" Switching to using an alphanumeric ID > rather than a simple number would require extra care to maintain > backward compatibility with all those existing configs and previous > versions of libvirt that might end up being the recipient of xml > generated by a newer libvirt. Because of this, at the very least the > pci.0 bus must be referred to as bus='0'; once we've done that, we might > as well refer to them *all* numerically (anyway, even if names were > allowed, I'm sure everybody would just call them '1', '2', (or at the > very most "pci.1", "pci.2") etc. anyway. > > > >>>>>>>> <interface type='direct'> <!-- NIC on host bridge 2 --> > >>>>>>>> ... > >>>>>>>> <address type='pci' domain='1' bus='0' slot='3'/> > >>>>>>>> </controller> > >>>>>>>> > >>>>>>>> > >>>>>>>> NB this means that 'index' values can be reused against the > >>>>>>>> <controller>, provided they are setup on different pci-roots. > >>>>>>>> > >>>>>>>>> (also note that it might happen that the bus number in libvirt's config > >>>>>>>>> will correspond to the bus numbering that shows up in the guest OS, but > >>>>>>>>> that will just be a happy coincidence) > >>>>>>>>> > >>>>>>>>> Does this make sense? > >>>>>>>> Yep, I think we're fairly close. > >>>>>>> What about the other types of pci controllers that are used by PCIe? We > >>>>>>> should make sure they fit in this model before we settle on it. > >>>>>> What do they do ? > >>> (The descriptions of different models below tell what each of these > >>> other devices does; in short, they're all just some sort of electronic > >>> Lego to help connect PCI and PCIe devices into a tree). > >>> > >>> Okay, I'll make yet another attempt at understanding these devices, and > >>> suggesting how they can all be described in the XML. I'm thinking that > >>> *all* of the express hubs, switch ports, bridges, etc can be described > >>> in xml in the manner above, i.e.: > >>> > >>> <controller type='pci' index='n'> > >>> <model type='xxx'/> > >>> </controller> > >>> > >>> and that the method for connecting a device to any of them would be by > >>> specifying: > >>> > >>> <address type='pci' domain='n' bus='n' slot='n' function='n'/> > >>> > >>> Any limitations about which devices/controllers can connect to which > >>> controllers, and how many devices can connect to any particular > >>> controller will be derived from the <model type='xxx'/>. (And, as we've > >>> said before, although qemu doesn't assign each of these controllers a > >>> numeric bus id, and although we can make no guarantee that the bus id we > >>> use for a particular controller is what will be used by the guest > >>> BIOS/OS, it's still a convenient notation and works well with other > >>> hypervisors as well as qemu. I'll also note that when I run lspci on an > >>> X58-based machine I have here, *all* of the relationships between all > >>> the devices listed below are described with simple bus:slot.function > >>> numbers.) > >>> > >>> Here is a list of the pci controller model types and their restrictions > >>> (thanks to mst and aw for repeating these over and over to me; I'm sure > >>> I still have made mistakes, but at least it's getting closer). > >>> > >>> > >>> <controller type='pci-root'> > >>> ============================ > >>> > >>> Upstream: nothing > >>> Downstream: only a single pci-root-bus (implied) > >>> qemu commandline: nothing (it's implied in the q35 machinetype) > >>> > >>> Explanation: > >>> > >>> Each machine will have a different controller called "pci-root" as > >>> outlined above by Daniel. Two types of pci-root will be supported: > >>> i440FX and q35. If a pci-root is not spelled out in the config, one will > >>> be auto-added (depending on machinetype). > >>> > >>> An i440FX pci-root has an implicitly added pci-bridge at 0:0:0.0 (and > >>> any bridge that has an address of slot='0' on its own bus is, by > >>> definition, connected to a pci-root controller - the two are matched by > >>> setting "domain" in the address of the pci-bridge to "index" of the > >>> pci-root). This bridge can only have PCI devices added. > >>> > >>> A q35 pci-root also implies a different kind of pci-bridge device - one > >>> that can only have PCIe devices/controllers attached, but is otherwise > >>> identical to the pci-bridge added for i440FX. This bus will be called > >>> "root-bus" (Note that there are generally followed conventions for what > >>> can be connected to which slot on this bus, and we will probably follow > >>> those conventions when building a machine, *but* we will not hardcode > >>> this convention into libvirt; each q35 machine will be an empty slate) > >>> > >>> > >>> <controller type='pci'> > >>> ======================= > >>> > >>> This will be used for *all* of the following controller devices > >>> supported by qemu: > >>> > >>> <model type='pcie-root-bus'/> (implicit/integrated) > >>> ---------------------------- > >>> > >>> Upstream: connect to pci-root controller *only* > >>> Downstream: 32 slots, PCIe devices only, no hotplug. > >>> qemu commandline: nothing (implicit in the q35-* machinetype) > >>> > >>> This controller is the bus described above that connects to a q35's > >>> pci-root, and provides places for PCIe devices to connect. Examples are > >>> root-ports, dmi-to-pci-bridges sata controllers, integrated > >>> sound/usb/ethernet devices (do any of those that can be connected to the > >>> pcie-root-bus exist yet?). > >>> > >>> There is only one of these controllers, and it will *always* be > >>> index='0', and will always have the following address: > >>> > >>> <address type='pci' domain='0' bus='0' slot='0' function='0'/> > >> Implicit devices make me nervous, why wouldn't this just be a pcie-root > >> (or pcie-host-bridge)? If we want to support multiple host bridges, > >> there can certainly be more than one, so the index='0' assumption seems > >> to fall apart. > > > That's when we need to start talking about a "domain" attribute, like this: > > <controller type='pci' domain='1' index='0'> > <model type='pcie-root-bus'/> > </controller> > > > >>> <model type='root-port'/> (ioh3420) > >>> ------------------------- > >>> > >>> Upstream: PCIe, connect to pcie-root-bus *only* (?) > >> yes > >> > >>> Downstream: 1 slot, PCIe devices only (?) > >> yes > >> > >>> qemu commandline: -device ioh3420,... > >>> > >>> These can only connect to the "pcie-root-bus" of of a q35 (implying that > >>> this bus will need to have a different model name than the simple > >>> "pci-bridge" > >>> > >>> > >>> <model type='dmi-to-pci-bridge'/> (i82801b11-bridge) > >> I'm worried this name is either too specific or too generic. What > >> happens when we add a generic pcie-bridge and want to use that instead > >> of the i82801b11-bridge? The guest really only sees this as a > >> PCIe-to-PCI bridge, it just happens that on q35 this attaches at the DMI > >> port of the MCH. > > > Hehe. Just using the name you (Alex) suggested :-) > > My use of the "generic" device *type* names rather than exact hardware > model names is based on the idea that any given machinetype will have a > set of these "building block" devices available, and as long as you use > everything from the same "set" on a given machine, it doesn't really > matter which set you use. Is this a valid assumption? > > > >> > >>> --------------------------------- > >>> > >>> (btw, what does "dmi" mean?) > >> http://en.wikipedia.org/wiki/Direct_Media_Interface > >> > >>> Upstream: pcie-root-bus *only* > >> And only to a specific q35 slot (1e.0) for the i82801b11-bridge. > >> > >>> Downstream: 32 slots, any PCI device, no hotplug (?) > >> Yet, but I think this is where we want to implement ACPI based hotplug. > > > Okay, but for now libvirt can just refrain from auto-addressing any > user-created devices to that bus; we'll just make sure that there is > always a "pci-bridge" plugged into it, and auto-addressed devices will > all be put there. > > In the meantime if someone explicitly addresses a device to connect to > the i82801b11-bridge, we'll let them do it, but if they try to > hot-unplug it they will get an error. > > > >> > >>> qemu commandline: -device i82801b11-bridge,... > >>> > >>> > >>> <model type='upstream-switch-port'/> (x3130-upstream) > >>> ------------------------------------ > >>> > >>> Upstream: PCIe, connect to pcie-root-bus, root-port, or > >>> downstream-switch-port (?) > >> yes > >> > >>> Downstream: 32 slots, connect *only* to downstream-switch-port > >> I can't verify that there are 32 slots, mst? I've only setup downstream > >> ports within slot 0. > > > According to a discussion with Don Dutile on IRC yesterday, the > downstream side of an upstream-switch-port has 32 "slots" with 8 > "functions" each, and each of these functions can have a > downstream-switch-port connected. That said, he told me that in every > case he's seen in the real world, all the downstream-switch-ports were > connected to "function 0", effectively limiting it to 32 > downstreams/upstream. > > > >> > >>> qemu-commandline: -device x3130-upstream > >>> > >>> > >>> This is the upper side of a switch that can multiplex multiple devices > >>> onto a single port. It's only useful when one or more downstream switch > >>> ports are connected to it. > >>> > >>> <model type='downstream-switch-port'/> (xio3130-downstream) > >>> -------------------------------------- > >>> > >>> Upstream: connect *only* to upstream-switch-port > >>> Downstream: 1 slot, any PCIe device > >>> qemu commandline: -device xio3130-downstream > >>> > >>> You can connect one or more of these to an upstream-switch-port in order > >>> to effectively plug multiple devices into a single PCIe port. > >>> > >>> <model type='pci-bridge'/> (pci-bridge) > >>> -------------------------- > >>> > >>> Upstream: PCI, connect to 1) pci-root, 2) dmi-to-pci-bridge, 3) > >>> another pci-bridge > >>> Downstream: any PCI device, 32 slots > >>> qemu commandline: -device pci-bridge,... > >>> > >>> This differs from dmi-to-pci-bridge in that its upstream connection is > >>> PCI rather than PCIe (so it will work on an i440FX system, which has no > >>> root PCIe bus) and that hotplug is supported. In general, if a guest > >>> will have any PCI devices, one of these controllers should be added, and > >>> > >>> =============================================================== > >>> > >>> > >>> Comment: I'm not quite convinced that we really need the separate > >>> "pci-root" device. Since 1) every pci-root will *always* have either a > >>> pcie-root-bus or a pci-bridge connected to it, 2) the pci-root-bus will > >>> only ever be connected to the pci-root, and 3) the pci-bridge that > >>> connects to it will need special handling within the pci-bridge case > >>> anyway, why not: > >>> > >>> 1) eliminate the separate pci-root controller type > >>> > >>> 2) within <controller type='pci'>, a new <model type='pci-root-bus'/> > >>> will be added. > >>> > >>> 3) a pcie-root-bus will automatically be added for q35 machinetypes, and > >>> pci-root-bus for any machinetype that supports a PCI bus (e.g. "pc-*") > >>> > >>> 4) model type='pci-root-bus' will behave like pci-bridge, except that it > >>> will be an implicit device (nothing on qemu commandline) and it won't > >>> need an <address> element (neither will pcie-root-bus). > >> I think they should both have a domain + bus address to make it possible > >> to build multi-domain/multi-host bridge systems. They do not use any > >> slots through. > > > Yes. I think I agree with that. But we don't have to implement the > multiple-domain stuff today (since qemu doesn't support it yet), and > when we do, I think we can just add a "domain" attribute to the main > element of pci-root and pcie-root controllers. > > > >>> 5) to support multiple domains, we can simply add a "domain" attribute > >>> to the toplevel of controller. > >>> > >> Or this Wouldn't even be unnecessary if we supported a 'pci-root-addr' > >> address type for the above with the default being domain=0, bus=0? I > >> suppose it doesn't matter whether it's a separate attribute or new > >> address type though. Thanks, > > I think you're mixing up the purpose of the <address> element vs the "index" attribute in the main <controller> element. To clarify, take this example: > > > <controller type='pci' index='3'> > <model type='pci-bridge'/> > <address domain='0' bus='1' slot='9' function='0'/> > </controller> > > This controller is connected to slot 9 of the already-existing bus 1. It provides a bus 3 for other devices to connect to. If we wanted to start up a domain 1, we would do something like this: > > <controller type='pci' domain='1' index='0'> > <model type='pci-root'/> > </controller> > > This would give us a PCI bus 0 in domain 1. You could then connect a pci-bridge to it like this: > > > <controller type='pci' domain='1' index='1'> > <model type='pci-bridge'/> > <address type='pci' domain='1' bus='0' slot='1' function='0'/> > </controller> > > The <address> tells us that this new bus connects to slot 1 of PCI bus 0 in domain 1. The <controller domain='1' index='1'> tells us that there is now a new bus other devices can connect to that is at domain='1' bus='1'. > > > > Also AFAIK there's nothing in the spec that requires bus=0 > > to be root. The _BBN hack above is used sometimes to give !=0 > > bus numbers to roots. > > I don't really understand that, but do you think that 1) qemu would ever > want/be able to model that, or that 2) anyone would ever have a > practical reason for wanting to? It's really cool and all to be able to > replicate any possible esoteric hardware configuration in a virtual > machine, but it seems like the only practical use of replicating > something like that would be for someone wanting to test what their OS > does when there's no domain=0 in the hardware... -- libvir-list mailing list libvir-list@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/libvir-list