On Wed, Nov 30, 2011 at 09:52:37PM +0530, Dipankar Sarma wrote: > create the guest topology correctly and optimize for NUMA. This > would work for us. Even on the case of 1 guest that fits in one node, you're not going to max out the full bandwidth of all memory channels with this. qemu all can do with ms_mbind/tbind is to create a vtopology that matches the hardware topology. It has these limits: 1) requires all userland applications to be modified to scan either the physical topology if run on host, or the vtopology if run on guest to get the full benefit. 2) breaks across live migration if host physical topology changes 3) 1 small guest on a idle numa system that fits in one numa node will tell not enough information to the host kernel 4) if used outside of qemu and one threads allocates more memory than what fits in one node it won't tell enough info to the host kernel. About 3): if you've just one guest that fits in one node, each vcpu should be spread across all the nodes probably, and behave like MADV_INTERLEAVE if the guest CPU scheduler migrate guests processes in reverse, the global memory bandwidth will still be used full even if they will both access remote memory. I've just seen benchmarks where no pinning runs more than _twice_ as fast than pinning with just 1 guest and only 10 vcpu threads, probably because of that. About 4): even if the thread scans the numa topology it won't be able to tell tell enough info to the kernel to know which parts of the memory may be used more or less (ok it may be possible to call mbind and vary it at runtime but it adds even more complexity left to the programmer). If the vcpu is free to go in any node, and we've a automatic vcpu<->memory affinity, then the memory will follow the vcpu. And the scheduler domains should already optimize for maxing out the full memory bandwidth of all channels. Trouble 1/2/3/4 applies to the hard bindings as well, not just to mbind/tbin. In short it's an incremental step that moves some logic to the kernel but I don't see it solving all situations optimally and it shares a lot of the limits of the hard bindings. -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html