On Wed, 4 Mar 2020, 14:51 Daniel P. Berrangé, <berrange@xxxxxxxxxx> wrote: > > We've been doing alot of refactoring of code in recent times, and also > have plans for significant infrastructure changes. We still need to > spend time delivering interesting features to users / applications. > This mail is to introduce an idea for a solution to an specific > area applications have had long term pain with libvirt's current > "mechanism, not policy" approach - device addressing. This is a way > for us to show brand new ideas & approaches for what the libvirt > project can deliver in terms of management APIs. > > To set expectations straight: I have written no code for this yet, > merely identified the gap & conceptual solution. > > > The device addressing problem > ============================= > > One of the key jobs libvirt does when processing a new domain XML > configuration is to assign addresses to all devices that are present. > This involves adding various device controllers (PCI bridges, PCI root > ports, IDE/SCSI buses, USB controllers, etc) if they are not already > present, and then assigning PCI, USB, IDE, SCSI, etc, addresses to each > device so they are associated with controllers. When libvirt spawns a > QEMU guest, it will pass full address information to QEMU. > > Libvirt, as a general rule, aims to avoid defining and implementing > policy around expansion of guest configuration / defaults, however, it > is inescapable in the case of device addressing due to the need to > guarantee a stable hardware ABI to make live migration and save/restore > to disk work. The policy that libvirt has implemented for device > addressing is, as much as possible, the same as the addressing scheme > QEMU would apply itself. > > While libvirt succeeds in its goal of providing a stable hardware API, > the addressing scheme used is not well suited to all deployment > scenarios of QEMU. This is an inevitable result of having a specific > assignment policy implemented in libvirt which has to trade off mutually > incompatible use cases/goals. > > When the libvirt addressing policy is not been sufficient, management > applications are forced to take on address assignment themselves, > which is a massive non-trivial job with many subtle problems to > consider. > > Places where libvirt's addressing is insufficient for PCI include > > * Setting up multiple guest NUMA nodes and associating devices to > specific nodes > * Pre-emptive creation of extra PCIe root ports, to allow for later > device hotplug on PCIe topologies > * Determining whether to place a device on a PCI or PCIe bridge > * Controlling whether a device is placed into a hotpluggable slot > * Controlling whether a PCIe root port supports hotplug or not > * Determining whether to places all devices on distinct slots or > buses, vs grouping them all into functions on the same slot > * Ability to expand the device addressing without being on the > hypervisor host (I don't understand the last bullet point) > > Libvirt wishes to avoid implementing many different address assignment > policies. It also wishes to keep the domain XML as a representation > of the virtual hardware, not add a bunch of properties to it which > merely serve as tunable input parameters for device addressing > algorithms. > > There is thus a dilemma here. Management applications increasingly > need fine grained control over device addressing, while libvirt > doesn't want to expose fine grained policy controls via the XML. > > > The new libvirt-devaddr API > =========================== > > The way out of this is to define a brand new virt management API > which tackles this specific problem in a way that addresses all the > problems mgmt apps have with device addressing and explicitly > provides a variety of policy impls with tunable behaviour. > > By "new API", I actually mean an entirely new library, completely > distinct from libvirt.so, or anything else we've delivered so > far. The closest we've come to delivering something at this kind > of conceptual level, would be the abortive attempt we made with > "libvirt-builder" to deliver a policy-driven API instead of mechanism > based. This proposal is still quite different from that attempt. > > At a high level > > * The new API is "libvirt-devaddr" - short for "libvirt device addressing" > > * As input it will take > > 1. The guest CPU architecture and machine type > 2. A list of global tunables specifying desired behaviour of the > address assignment policy > 3. A minimal list of devices needed in the virtual machine, with > optional addresses and optional per-device tunables to override > the global tunables > > * As output it will emit > > 1. fully expanded list of devices needed in the virtual machine, > with addressing information sufficient to ensure stable hardware ABI > > Initially the API would implement something that behaves the same > way as libvirt's current address assignment API. > > The intended usage would be > > * Mgmt application makes a minimal list of devices they want in > their guest > * List of devices is fed into libvirt-devaddr API > * Mgmt application gets back a full list of devices & addresses > * Mgmt application writes a libvirt XML doc using this full list & > addresses > * Mgmt application creates the guest in libvirt > > IOW, this new "libvirt-devaddr" API is intended to be used prior to > creating the XML that is used by libvirt. The API could also be used > prior to needing to hotplug a new device to an existing guest. > This API is intended to be a deliverable of the libvirt project, but > it would be completely independent of the current libvirt API. Most > especially note that it would NOT use the domain XML in any way. > This gives applications maximum flexibility in how they consume this > functionality, not trying to force a way to build domain XML. This procedure forces Mgmt to learn a new language to describe device placement. Mgmt (or should I just say "we"?) currently expresses the "minimal list of devices" in XML form and pass it to libvirt. Here we are asked to pass it once to libvirt-devaddr, parse its output, and feed it as XML to libvirt. I believe it would be easier to use the domxml as the base language for the new library, too. libvirt-devaddr would accept it with various hints (expressed as its own extension to the XML?) such as "place all of these devices in the same NUMA node", "keep on root bus" or "separate these two chattering devices to their own bus". The output of libvirt-devaddr would be a domxml with <devices> filled with controllers and addresses, readily available for consumption by libvirt. > > > It would have greater freedom in its API design, making different > choices from libvirt.so on topics such as programming language (C vs > Go vs Python etc), API stability timeframe (forever stable vs sometimes > changing API), data formats (structs, vs YAML/JSON vs XML etc), and of > course the conceptual approach (policy vs mechanism) > > The expectation is that this new API would be most likely to be > consumed by KubeVirt, OpenStack, Kata, as the list of problems shown > earlier is directly based on issues seen working with KubeVirt & > OpenStack in particular. And thank you for that. > It is not limited to these applications and > is broadly useful as conceptual thing. > > It would be a goal that this API should also be used by libvirt > itself to replace its current internal device addressing impl. > Essentially the new API should be seen as a way to expose/extract > the current libvirt internal algorithm, making it available to > applications in a flexible manner. I don't anticipate actually copying > the current addressing code in libvirt as-is, but it would certainly > serve as reference for the kind of logic we need to implement, so you > might consider it a "port" or "rewrite" in some very rough sense. > > I think this new API concept is a good way for the project make a start > in using Go for libvirt. The functionality covered has a clearly defined > scope limit, making it practical to deliver a real impl in a reasonably > short time frame. Extracting this will provide a real world benefit to > our application consumers, solving many long standing problems they have > with libvirt, and thus justify the effort in doing this work in libvirt > in a non-C language. The main question mark would be about how we might > make this functionality available to Python apps if we chose Go. It is > possible to expose a C API from Go, and we would need this to consume it > from libvirt. There is then the need to manually write a Python API binding > which is tedious work. > > Regards, > Daniel > -- > |: https://berrange.com -o- https://www.flickr.com/photos/dberrange :| > |: https://libvirt.org -o- https://fstop138.berrange.com :| > |: https://entangle-photo.org -o- https://www.instagram.com/dberrange :| >