Hi list, I'm planing on adding API that will be used instead of virDomainSetVcpus and will allow a more granular control of which virtual CPUs are enabled for a guest. The new approach will allow to use cpu hotplug properly with NUMA guests as the old APIs would not allow adding CPUs to very specific cgroups. The old APIs should still work fine with the current approach although the final implementation should also allow to unplug vcpus from the guest by using new qemu features. I'm still not sure though whether it will be possible to use this in a backward compatible fashion though depending how this stuff will exactly need to be set up in qemu. # API # As for the new API I'm thinking of the following design: int virDomainVcpu(virDomainPtr domain, unsigned int id, unsigned int flags); The flags for this API would be following: - usual domain modification impact: * VIR_DOMAIN_SET_VCPU_CURRENT * VIR_DOMAIN_SET_VCPU_LIVE * VIR_DOMAIN_SET_VCPU_CONFIG - for specifying the operation as the default operation would query the cpu state: * VIR_DOMAIN_SET_VCPU_ENABLE * VIR_DOMAIN_SET_VCPU_DISABLE - misc: * VIR_DOMAIN_SET_VCPU_GUEST - use the guest agent instead of ACPI hotplug * VIR_DOMAIN_SET_VCPU_NUMA_NODE - 'id' is the ID of a numa node where the cpu should be enabled/disabled rather than CPU id. This is a convenience flag that will allow to add cpu to a given numa node rather than having to find the correct ID. * VIR_DOMAIN_SET_VCPU_CORE - use thread level hotplug (see [1]). This makes sure that the CPU will be plugged in on platforms that require to plug in multiple threads at once. VIR_DOMAIN_SET_VCPU_NUMA_NODE and VIR_DOMAIN_SET_VCPU_GUEST are mutually exclusive as the guest agent doesn't report the guest numa node the CPU is belonging to . If the idea of one API that will both query and set is too nonconformist to our existing API design I have no problem adding Get/Set versions and/or explode the ADD/REMOVE flags into a separate parameter. # XML # The new API will require us to add new XML that will allow to track the state of VCPUs individually. Internally we now have a data structure allowing to keep the relevant data in one place. Currently we are setting data relevant to VCPUs in many places. <domain> [...] <vcpu current='1'>3</vcpu> [...] <cputune> <cpupin ... /> </cputune> [...] <cpu> <numa> <cell id='0' cpus='0' memory='102400' unit='KiB/> <cell id='1' cpus='1-2' memory='102400' unit='KiB/> </numa> As we'll be required to keep the state for every single cpu I'm thinking of adding a new subelement called '<vcpus>' to <domain>. This will have a '<vcpu>' subelement for every configured cpu. I'm specifically not going to add any of the cpupin or numa node ids to the /domain/vcpus/vcpu as input parameters to avoid introducing very compicated checking code that would be required to keep the data in sync. I'm thinking of adding the numa node id as an output only attribute since it's relevant to the hotplug case and it's misplaced otherwise. I certainly can add the duplicated data as output-only attributes. The XML with the new elements should look like: <domain> [...] <vcpu current='1'>3</vcpu> <vcpus> <vcpu id='0' state='enabled'/> <-- option 1, no extra data <vcpu id='1' state='disabled' cell='1'/> <--- option 2, just numa node, since it's non-obvious <vcpu id='2' state='disabled' cell='1' pin='1-2' scheduler='...'/> <!-- option 3 all the data duplicated --> </vcpus> [...] <cputune> <cpupin ... /> </cputune> [...] <cpu> <numa> <cell id='0' cpus='0' memory='102400' unit='KiB/> <cell id='1' cpus='1-2' memory='102400' unit='KiB/> </numa> # migration # To ensure migration compatibility a new libvirt will set a new migration feature flag in cases where a sparse topology was created by any means. Older versions of libvirt will reject it. As the new cpu data will be ignored by the parser of older libvirt we don't need to stop formatting them on migration. (fortunately schemas are not validated during migration) # qemu/platform implementation caveats # When starting the VM for the first time it might be necessary to start a throw-away qemu process to query some details that we'll need to pass in on a command line. I'm not sure if this is still necessary and I'll try to avoid it at all cost. [1] Some architectures (ppc64) don't directly support thread-level hotplug and thus require us to plug in a core which translates into multiple threads (8 in case of power 8). Possibly other yet unknown problems. Thanks for your feedback. Peter -- libvir-list mailing list libvir-list@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/libvir-list