On 24.06.2010, at 00:21, Anthony Liguori wrote: > On 06/23/2010 04:09 PM, Andre Przywara wrote: >> Hi, >> >> these three patches add basic NUMA pinning to KVM. According to a user >> provided assignment parts of the guest's memory will be bound to different >> host nodes. This should increase performance in large virtual machines >> and on loaded hosts. >> These patches are quite basic (but work) and I send them as RFC to get >> some feedback before implementing stuff in vain. >> >> To use it you need to provide a guest NUMA configuration, this could be >> as simple as "-numa node -numa node" to give two nodes in the guest. Then >> you pin these nodes on a separate command line option to different host >> nodes: "-numa pin,nodeid=0,host=0 -numa pin,nodeid=1,host=2" >> This separation of host and guest config sounds a bit complicated, but >> was demanded last time I submitted a similar version. >> I refrained from binding the vCPUs to physical CPUs for now, but this >> can be added later with an "cpubind" option to "-numa pin,". Also this >> could be done from a management application by using sched_setaffinity(). >> >> Please note that this is currently made for qemu-kvm, although I am not >> up-to-date regarding the curent status of upstreams QEMU's true SMP >> capabilities. The final patch will be made against upstream QEMU anyway. >> Also this is currently for Linux hosts (any other KVM hosts alive?) and >> for PC guests only. I think both can be fixed easily if someone requests >> it (and gives me a pointer to further information). >> >> Please comment on the approach in general and the implementation. >> > > If we extended integrated -mem-path with -numa such that a different path could be used with each numa node (and we let an explicit file be specified instead of just a directory), then if I understand correctly, we could use numactl without any specific integration in qemu. Does this sound correct? > > IOW: > > qemu -numa node,mem=1G,nodeid=0,cpus=0-1,memfile=/dev/shm/node0.mem -numa node,mem=2G,nodeid=1,cpus=1-2,memfile=/dev/shm/node1.mem > > It's then possible to say: > > numactl --file /dev/shm/node0.mem --interleave=0,1 > numactl --file /dev/shm/node1.mem --membind=2 > > I think this approach is nicer because it gives the user a lot more flexibility without having us chase other tools like numactl. For instance, your patches only support pinning and not interleaving. Interesting idea. So who would create the /dev/shm/nodeXX files? I can imagine starting numactl before qemu, even though that's cumbersome. I don't think it's feasible to start numactl after qemu is running. That'd involve way too much magic that I'd prefer qemu to call numactl itself. Alex -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html