Re: [Qemu-devel] Using Linux's CPUSET for KVM VCPUs

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Thu, Jul 22, 2010 at 04:03:13PM +0200, Andre Przywara wrote:
> Hi all,
> 
> while working on NUMA host pinning, I experimented with vCPU affinity 
> within QEMU, but left it alone as it would complicate the code and would 
> not achieve better experience than using taskset with the monitor 
> provided thread ids like it is done currently. During that I looked at 
> Linux' CPUSET implementation 
> (/src/linux-2.6/Documentation/cgroups/cpusets.txt).
> In brief, this is a pseudo file system based, truly hierarchical 
> implementation of restricting a set of processes (or threads, it uses 
> PIDs) to a certain subset of the machine.
> Sadly we cannot leverage this for true guest NUMA memory assignment, but 
> it would work nice for pinning (or not) guest vCPUs. 

IIUC the 'cpuset.mems' tunable let you control the NUMA node that
memory allocation will come out of. It isn't as flexible as numactl
policies, since you can't request interleaving, but if you're just 
look to control node locality I think it would do.

>                                                       I had the following 
> structure in mind:
> For each guest there is a new CPUSET (mkdir $CPUSET_MNT/`cat 
> /proc/$$/cpuset`/kvm_$guestname). One could then assign the guest global 
> resources to this CPUSET.
> For each vCPU there is a separate CPUSET located under this guest global 
> one. This would allow for easy manipulation of the pinning of vCPUs, 
> even from the console without any mgt app (although this could be easily 
> implemented in libvirt).

FYI, if you have any cgroup controllers mounted, libvirt  will already
automatically create a dedicated sub-group for every guest you run.
The main reason we use cgroups is that it lets us apply controls to a
group of PIDs at once (eg cpu.cpu_shares to all threads within QEMU,
instead of nice(2) on each individual threads). When dealing at the
individual vCPU level there are single PIDs again, libvirt hasn't
needed further cgroup subdivision, just using traditional Linux APIs
instead.

> /
> |
> +--/ kvm_guest_01
> |  |
> |  +-- VCPU0
> |  |
> |  +-- VCPU1
> |
> +--/ kvm_guest_02
> ...
> 
> What do you think about it? It is worth implementing this?

Having at least one cgroup per guest has certainly proved valuable for
libvirt's needs. If not using a mgmt API exposing vcpus (and other
internal QEMU threads) via named sub-cgroups could be quite convenient

Regards,
Daniel
-- 
|: Red Hat, Engineering, London    -o-   http://people.redhat.com/berrange/ :|
|: http://libvirt.org -o- http://virt-manager.org -o- http://deltacloud.org :|
|: http://autobuild.org        -o-         http://search.cpan.org/~danberr/ :|
|: GnuPG: 7D3B9505  -o-   F3C9 553F A1DA 4AC2 5648 23C1 B3DF F742 7D3B 9505 :|
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[Index of Archives]     [KVM ARM]     [KVM ia64]     [KVM ppc]     [Virtualization Tools]     [Spice Development]     [Libvirt]     [Libvirt Users]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite Questions]     [Linux Kernel]     [Linux SCSI]     [XFree86]
  Powered by Linux