Re: [PATCH 0/3][RFC] NUMA: add host side pinning

Alexander Graf <agraf@xxxxxxx> · Thu, 24 Jun 2010 00:29:51 +0200

On 24.06.2010, at 00:21, Anthony Liguori wrote:

> On 06/23/2010 04:09 PM, Andre Przywara wrote:
>> Hi,
>> 
>> these three patches add basic NUMA pinning to KVM. According to a user
>> provided assignment parts of the guest's memory will be bound to different
>> host nodes. This should increase performance in large virtual machines
>> and on loaded hosts.
>> These patches are quite basic (but work) and I send them as RFC to get
>> some feedback before implementing stuff in vain.
>> 
>> To use it you need to provide a guest NUMA configuration, this could be
>> as simple as "-numa node -numa node" to give two nodes in the guest. Then
>> you pin these nodes on a separate command line option to different host
>> nodes: "-numa pin,nodeid=0,host=0 -numa pin,nodeid=1,host=2"
>> This separation of host and guest config sounds a bit complicated, but
>> was demanded last time I submitted a similar version.
>> I refrained from binding the vCPUs to physical CPUs for now, but this
>> can be added later with an "cpubind" option to "-numa pin,". Also this
>> could be done from a management application by using sched_setaffinity().
>> 
>> Please note that this is currently made for qemu-kvm, although I am not
>> up-to-date regarding the curent status of upstreams QEMU's true SMP
>> capabilities. The final patch will be made against upstream QEMU anyway.
>> Also this is currently for Linux hosts (any other KVM hosts alive?) and
>> for PC guests only. I think both can be fixed easily if someone requests
>> it (and gives me a pointer to further information).
>> 
>> Please comment on the approach in general and the implementation.
>>   
> 
> If we extended integrated -mem-path with -numa such that a different path could be used with each numa node (and we let an explicit file be specified instead of just a directory), then if I understand correctly, we could use numactl without any specific integration in qemu.  Does this sound correct?
> 
> IOW:
> 
> qemu -numa node,mem=1G,nodeid=0,cpus=0-1,memfile=/dev/shm/node0.mem -numa node,mem=2G,nodeid=1,cpus=1-2,memfile=/dev/shm/node1.mem
> 
> It's then possible to say:
> 
> numactl --file /dev/shm/node0.mem --interleave=0,1
> numactl --file /dev/shm/node1.mem --membind=2
> 
> I think this approach is nicer because it gives the user a lot more flexibility without having us chase other tools like numactl.  For instance, your patches only support pinning and not interleaving.

Interesting idea.

So who would create the /dev/shm/nodeXX files? I can imagine starting numactl before qemu, even though that's cumbersome. I don't think it's feasible to start numactl after qemu is running. That'd involve way too much magic that I'd prefer qemu to call numactl itself.

Alex

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html