Anthony Liguori wrote:
On 06/23/2010 04:09 PM, Andre Przywara wrote:
Hi,
these three patches add basic NUMA pinning to KVM. According to a user
provided assignment parts of the guest's memory will be bound to different
host nodes. This should increase performance in large virtual machines
and on loaded hosts.
These patches are quite basic (but work) and I send them as RFC to get
some feedback before implementing stuff in vain.
>> ....
Please comment on the approach in general and the implementation.
If we extended integrated -mem-path with -numa such that a different
path could be used with each numa node (and we let an explicit file be
specified instead of just a directory), then if I understand correctly,
we could use numactl without any specific integration in qemu. Does
this sound correct?
In general, yes. But I consider the whole hugetlbfs approach broken.
Since 2.6.32 or so you can use MAP_HUGETLB together with MAP_ANONYMOUS
in mmap() to avoid hugetlbfs at all, and I bet that the future will hold
transparent hugepages anyway (RHEL6 already has them).
I am not sure whether you want to keep the -memfile option and extend it
with some pseudo compat glue (faked directory names to be interpreted by
QEMU) to make it work in the future. But anyway in these cases the
external numactl approach would not work anymore.
IOW:
qemu -numa node,mem=1G,nodeid=0,cpus=0-1,memfile=/dev/shm/node0.mem
-numa node,mem=2G,nodeid=1,cpus=1-2,memfile=/dev/shm/node1.mem
It's then possible to say:
numactl --file /dev/shm/node0.mem --interleave=0,1
numactl --file /dev/shm/node1.mem --membind=2
I think this approach is nicer because it gives the user a lot more
flexibility without having us chase other tools like numactl. For
instance, your patches only support pinning and not interleaving.
That's right. I put it on the list ;-)
Thanks for the good hint on the huge pages issue, as this is not
properly handled in the current implementation. I will think about a
proper way to handle this, but would still opt for a (at least
partially) QEMU integrated solution.
Still open for discussion, though, as I see your point of avoiding
duplicate NUMA implementation between numactl and QEMU.
Regards,
Andre.
--
Andre Przywara
AMD-Operating System Research Center (OSRC), Dresden, Germany
Tel: +49 351 488-3567-12
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html