On Wed, Jun 08, 2016 at 08:04:59PM +0200, Tomasz Flendrich wrote:
Hello everyone, Let me introduce myself - I'm Tomasz Flendrich and I'm a Google Summer of Code student from University of Wrocław, Poland. My goal is to create a generic address allocator (for PCI, virtio, SCSI, etc.), because current handling of addresses has its flaws: sometimes addresses aren't properly assigned and checked for duplicates. Additionally, a generic solution will be very useful when more hypervisors add the feature of explicitly stating the addresses of devices. The key goal I'm willing to achieve at this moment is defining a minimum viable product (one thing to be implemented quickly which is reasonably complete on its own, and has potential for further development in the direction which makes sense).
Hi, sorry for replying a bit later. We talked about the internals together quite a lot and it's good to see the design upstream on the list so that everybody can express their opinions.
I came up with an initial design. The internal allocator's data will be kept in struct Allocator. There will be one structure for one address type and all of the allocators would be in an array.
Just as a note for now, I won't be saying much about naming stuff. 1) it could lead us into a rabbit hole that has no way out, 2) it is not needed for the first RFCs, 3) you'll see how other things will be named around, so I guess you will feel yourself what the right name will be, and finally 4) it is *very* likely that it will be done by others and discussed more than the rest of the code ;)
We will have the possibility of adding an address pool to the allocator. For example, when we create a root PCI bus, we tell the allocator that the following PCI addresses are possible: {0..0}:{0..31}.{0..7}, where {a..b} means a range from a to b, inclusive. This function call could look like this: allocatorAddPool( allocator[PCI_ADDRESS_TYPE], &outputAddress, AllocatorRange(0, 0), AllocatorRange(0, 31), AllocatorRange(0, 7)); The outputAddress would be an array owned by the caller (only filled in by the allocator). If we were to allocate IP addresses for computers from the range 192.168.1.2 to 192.168.1.240, where the router has to have the address 192.168.1.254:
Even though we won't be assigning IP addresses, it is really nice analogy to use and explain stuff on since hopefully everyone on the list knows more or less about the limits and properties of them.
First, we reserve the address for the router to let the Allocator know that it's in use, and that we can track collisions in manually assigned addresses: allocatorReserveAddress( allocator[IP_ADDRESS_TYPE], &outputAddress, AllocatorValue(192), AllocatorValue(168), AllocatorValue(1), AllocatorValue(254)); Then, we assign the addresses for the computers: allocatorReserveAddress( allocator[IP_ADDRESS_TYPE], &outputAddress, AllocatorValue(192), AllocatorValue(168), AllocatorValue(1), AllocatorRange(1, 254)); Please note that AllocatorValue() is now in use.
Since libvirt is pure C, no C++, you need to make sure errors are caught, esp. allocation errors. If AllocatorValue() is a function and it can fail, then this won't work. But you probably know that, right?
There could be a different function call to simply assign any address: allocatorReserveAny(Allocator* allocator, &outputAddress); Let's say that we want an "sda" disk. We could create a wrapper: allocatorReserveSCSIAddress(allocator, "sda"); All this wrapper does is representing the string "sda" as an integer mapping to the letter (eg. 'a' = 0, 'z' = 25): allocatorReserveAddress(allocator[SCSI_ADDRESS_TYPE], &outputAddress, 0); If an address is already determined, because it was specified in the XML or it's some specific device that has to be at some specific address, we still reserve it to let the Allocator know that it's in use. How would this work internally? One of the possible solutions is keeping a set of ranges of addresses. For example, suppose that we have two PCI busses to use. Our address pool is stored as one range: {0..1} {0..31} {0..7} Now someone reserves address 0:13.2 Our new free addresses pool is now stored as this set of ranges: {0..0} {0..12} {0..7}, {0..0} {12..12} {0..1}, {0..0} {12..12} {3..7}, {0..0} {13..31} {0..7}, {1..1} {0..31} {0..7} If we kept every address separately, it would require 2*32*8=512 addresses. The set data structure from gnulib's gl_oset.h is a candidate for keeping the ranges in a sorted fashion. Another candidate is simply a sorted list from gl_list.h.
We cannot use those since they are GPL and the library part of libvirt is LGPL. We can either create our own or just use whatever there's provided already. We have macros and functions for handling arrays; if you want it sorted, then VIR_INSERT_ELEMENT and others are your friends. With arrays we would just keep pointers there and move those which is not that expensive. Sparse arrays don't make sense to me in this situation. You could also convert it to bunch of bytes (either specifying a new format or just converting them to a string) and use our hash table (see virHashCreate()). But I don't think we'll need to handle these, let me explain why a bit below.
This structure would be able to handle all types of addresses that are convertible to a fixed-length list of integers. We don't mind how many of these integers there are, because we can use variadic arguments. It won't allow duplicates if we stick to using it in every place where we have some addresses. It will also be very generic, with the possibility of writing wrappers on top of it that might be more convenient for some particular address type. This way we would keep qemu_* files as they are for now, and just replace existing allocators (and manual assignment of addresses) to calls to our new allocator.
Actually, in order for us to be able to replace the calls we have there, we would need to go a bit different route. The one I suggested would actually look a bit different: Let's say we would start by moving qemuDomainAssignSpaprVIOAddress() to, for example, src/conf/domain_addr.c and renaming it to, for example virDomainDeviceAddressAssignSpaprVIO(). Then we would replace all its callers and so on. Every time the function needs to access something in the private data of the domain (qemuDomainObjPrivate), such data would be instead moved into the virDomainObj (or virDomainDef. Whenever qemu-specific information needs to be passed into that function, it could be converted to a parameter of that function. Either some value, flag or some more complex structure, e.g. allocopt (similarly to xmlopt which we have for parsing and formating XML). Slowly we might get to the point where lot of the code is changed to data that driver has stored either per-domain or globally.
My ideas: - A possibility of reserving a range of addresses might be used to reserve a whole PCI slot at once.
I don't see a use case for that.
- Flags could be added, changing the behavior of particular addresses or the whole allocator.
As above, we'll most probably need that.
My questions: - Is this approach good long-term? - Is address releasing expected to be required at any point? (think device hotplug/unplug, etc.)
Every time device is unplugged. At first I thought it will be only on hot-unplug, but it needs to be freed whenever the device is removed and the data will not be recalculated. And that can happen even on cold-unplug (virsh detach-device --config).
- What other features are needed?
We'll see :) Let's start with either of the approaches (I would personally suggest the one I had in mind as I think it goes nicely with current approaches in libvirt and will have less hurdles along the way. Think it through and let us know about the next design steps.
Please speak up your mind if you have remarks or thoughts about the idea. I'd really appreciate it. Tomasz
Have a nice day, Martin
Attachment:
signature.asc
Description: Digital signature
-- libvir-list mailing list libvir-list@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/libvir-list