On Mon, Jun 13, 2016 at 02:48:51PM +0200, Peter Krempa wrote:
Hi list, I'm planing on adding API that will be used instead of virDomainSetVcpus and will allow a more granular control of which virtual CPUs are enabled for a guest. The new approach will allow to use cpu hotplug properly with NUMA guests as the old APIs would not allow adding CPUs to very specific cgroups.
Great! We need that... Er, mgmt apps need that =)
The old APIs should still work fine with the current approach although the final implementation should also allow to unplug vcpus from the guest by using new qemu features. I'm still not sure though whether it will be possible to use this in a backward compatible fashion though depending how this stuff will exactly need to be set up in qemu.
If the worst comes to the worst, we can say the old API is deprecated and it'll just do basic things (as it does now). I haven't studied the code like probably did for some time before sending this, but I don't see that it should cause some major problems.
# API # As for the new API I'm thinking of the following design: int virDomainVcpu(virDomainPtr domain, unsigned int id, unsigned int flags); The flags for this API would be following: - usual domain modification impact: * VIR_DOMAIN_SET_VCPU_CURRENT * VIR_DOMAIN_SET_VCPU_LIVE * VIR_DOMAIN_SET_VCPU_CONFIG - for specifying the operation as the default operation would query the cpu state: * VIR_DOMAIN_SET_VCPU_ENABLE * VIR_DOMAIN_SET_VCPU_DISABLE - misc: * VIR_DOMAIN_SET_VCPU_GUEST - use the guest agent instead of ACPI hotplug * VIR_DOMAIN_SET_VCPU_NUMA_NODE - 'id' is the ID of a numa node where the cpu should be enabled/disabled rather than CPU id. This is a convenience flag that will allow to add cpu to a given numa node rather than having to find the correct ID. * VIR_DOMAIN_SET_VCPU_CORE - use thread level hotplug (see [1]). This makes sure that the CPU will be plugged in on platforms that require to plug in multiple threads at once. VIR_DOMAIN_SET_VCPU_NUMA_NODE and VIR_DOMAIN_SET_VCPU_GUEST are mutually exclusive as the guest agent doesn't report the guest numa node the CPU is belonging to .
So since the agent can only receive number of vcpus then no new feature will be usable with this flag until that command is added to the ga, right? Does it make sense to have this flag for the new API then?
If the idea of one API that will both query and set is too nonconformist to our existing API design I have no problem adding Get/Set versions and/or explode the ADD/REMOVE flags into a separate parameter.
I thought there already was a consensus reached about what should be the default choice for new APIs. I don't remember it, though, as I don't feel strongly for any of those.
# XML # The new API will require us to add new XML that will allow to track the state of VCPUs individually. Internally we now have a data structure allowing to keep the relevant data in one place. Currently we are setting data relevant to VCPUs in many places. <domain> [...] <vcpu current='1'>3</vcpu> [...] <cputune> <cpupin ... /> </cputune> [...] <cpu> <numa> <cell id='0' cpus='0' memory='102400' unit='KiB/> <cell id='1' cpus='1-2' memory='102400' unit='KiB/> </numa> As we'll be required to keep the state for every single cpu I'm thinking of adding a new subelement called '<vcpus>' to <domain>. This will have a '<vcpu>' subelement for every configured cpu. I'm specifically not going to add any of the cpupin or numa node ids to the /domain/vcpus/vcpu as input parameters to avoid introducing very compicated checking code that would be required to keep the data in sync. I'm thinking of adding the numa node id as an output only attribute since it's relevant to the hotplug case and it's misplaced otherwise. I certainly can add the duplicated data as output-only attributes. The XML with the new elements should look like: <domain> [...] <vcpu current='1'>3</vcpu> <vcpus> <vcpu id='0' state='enabled'/> <-- option 1, no extra data <vcpu id='1' state='disabled' cell='1'/> <--- option 2, just numa node, since it's non-obvious <vcpu id='2' state='disabled' cell='1' pin='1-2' scheduler='...'/> <!-- option 3 all the data duplicated -->
It is nice to have all the info in there, but won't it confuse users if it is output-only? Wait, let me rephrase that question. Won't it confuse users? Wait, most of our XML does already, so scratch that =) Anyway, how much duplicated info do we already have? I can now only think of the memory device which we had to have anyways. Would it be too confusing to just add <cpu/> device with all the info? That would require all the checks and lot of unnecessary code. But it would be consistent with the memory. And it actually is a device. Most probably not worth the pain. But OTOH if all the data are output-only... Sorry for the ramble, just my 2 cents.
</vcpus> [...] <cputune> <cpupin ... /> </cputune> [...] <cpu> <numa> <cell id='0' cpus='0' memory='102400' unit='KiB/> <cell id='1' cpus='1-2' memory='102400' unit='KiB/> </numa> # migration # To ensure migration compatibility a new libvirt will set a new migration feature flag in cases where a sparse topology was created by any means. Older versions of libvirt will reject it. As the new cpu data will be ignored by the parser of older libvirt we don't need to stop formatting them on migration. (fortunately schemas are not validated during migration)
Unless there are some of those loops through all child elements/attributes, but either you'll come across that or it will bite you in the ass during the first migration trial ;)
# qemu/platform implementation caveats # When starting the VM for the first time it might be necessary to start a throw-away qemu process to query some details that we'll need to pass in on a command line. I'm not sure if this is still necessary and I'll try to avoid it at all cost.
I hope capabilities will tell us what we need. If not, I hope it can be added.
[1] Some architectures (ppc64) don't directly support thread-level hotplug and thus require us to plug in a core which translates into multiple threads (8 in case of power 8). Possibly other yet unknown problems.
Fingers crossed for least amount of those.
Thanks for your feedback. Peter -- libvir-list mailing list libvir-list@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/libvir-list
Attachment:
signature.asc
Description: Digital signature
-- libvir-list mailing list libvir-list@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/libvir-list