On Thu, Jul 15, 2010 at 07:10:35PM +0200, Ralf Spenneberg wrote: > Hi, > > I just had a chance to play with KVM on Ubuntu 10.04 LTS on some new HP > 360 g6 with Nehalem processors. I have a feeling that KVM and NUMA on > these machines do not play well together. > > Doing some benchmarks I got bizarre numbers. Sometimes the VMs were > performing fine and some times the performance was very bad! Apparently > KVM does not recognize the NUMA-architecture and places memory and > process randomly and therefore often on different numa cells. > > First a couple of specs of the machine: > Two Nehalem sockets with E5520, Hyperthreading turned off, 4 cores per > socket, all in all 8 processors. > > > Linux recognizes the NUMA-architecture: > # numactl --hardware > available: 2 nodes (0-1) > node 0 cpus: 0 2 4 6 > node 0 size: 12277 MB > node 0 free: 9183 MB > node 1 cpus: 1 3 5 7 > node 1 size: 12287 MB > node 1 free: 8533 MB > node distances: > node 0 1 > 0: 10 20 > 1: 20 10 If numactl --hardware works, then libvirt should work, since libvirt uses the numactl library to query topology > > So I have got two cells with 4 cores each. > > Virsh does not recognize the topology: > # virsh capabilities > <capabilities> > <host> > <cpu> > <arch>x86_64</arch> > <model>core2duo</model> > <topology sockets='2' cores='4' threads='1'/> > <feature name='lahf_lm'/> > .. The NUMA topology does not get put inside the <cpu> element. It is one level up in a <topology> element. eg <capabilities> <host> <cpu> <arch>x86_64</arch> ....snip.... </cpu> ...snip... <topology> <cells num='2'> <cell id='0'> <cpus num='4'> <cpu id='0'/> <cpu id='1'/> <cpu id='2'/> <cpu id='3'/> </cpus> </cell> <cell id='1'> <cpus num='4'> <cpu id='4'/> <cpu id='5'/> <cpu id='6'/> <cpu id='7'/> </cpus> </cell> </cells> </topology> This shows 2 numa nodes (cells in libvirt terminology) each with 4 CPUs. You can also query free RAM in each node/cell # virsh freecell 0 0: 1922084 kB # virsh freecell 1 1: 1035700 kB >From both of these you can then decide where to place the guest > I guess this is the fact, because QEMU does not recognize the > NUMA-Architecture (QEMU-Monitor): > (qemu) info numa > 0 nodes IIRC this is reporting the guest NUMA topology which is completely independant of host NUMA topology. > So apparently KVM does not utilize the NUMA-architecture. Did I do > something wrong. Is KVM missing a patch? Do I need to activate something > in KVM to recognize the NUMA-Architecture? There are two aspects to NUMA. 1. Placing QEMU on appropriate NUMA ndes. 2. defining guest NUMA topology By default QEMU will float freely across any CPUs and all the guest RAM will appear in one node. This is can be bad for performance, especially if you are benchmarking So for performance testing you definitely want to bind QEMU to the CPUs within a single NUMA node at startup, this will mean that all memory accesses are local to the node. Unless you give the guest more virtual RAM, than there is free RAM on the local NUMA node. Since you suggest you're using libvirt, the low level way todo this is in the guest XML at the <vcpu> element In my capabilities XML example above you can see 2 numa nodes, each with 4 cpus. So if I want to restrict the guest to the first NUMA node which has CPU numbers 0, 1, 2, 3, then I'd do <domain type='kvm' id='8'> <name>rhel6x86_64</name> <uuid>0bbf8187-bce1-bc77-2a2c-fb033816f7f4</uuid> <memory>819200</memory> <currentMemory>819200</currentMemory> <vcpu cpuset='0-3'>2</vcpu> ...snip... You can verify the pinning with virsh cpuinfo # virsh vcpuinfo rhel5xen VCPU: 0 CPU: 1 State: running CPU time: 15.9s CPU Affinity: yyyy---- VCPU: 1 CPU: 2 State: running CPU time: 9.5s CPU Affinity: yyyy---- ....snip rest... It is not yet possible to define the guest visible NUMA topology via libvirt, but that shouldn't be too critical for performance unless you needed to your guest to be able to span multiple host nodes. For further performance you also really want to enable hugepages on your host (eg mount hugetlbfs at /dev/hugepages), then restart libvirtd daemon, and then add the following to your guest XML just after the <memory> element: <memoryBacking> <hugepages/> </memoryBacking> This will make it pre-allocate hugepages for all guest RAM at startup. NB the downside is that you can't overcommit RAM, but that's a tradeoff between maximising utilization and maximising performance. Daniel -- |: Red Hat, Engineering, London -o- http://people.redhat.com/berrange/ :| |: http://libvirt.org -o- http://virt-manager.org -o- http://deltacloud.org :| |: http://autobuild.org -o- http://search.cpan.org/~danberr/ :| |: GnuPG: 7D3B9505 -o- F3C9 553F A1DA 4AC2 5648 23C1 B3DF F742 7D3B 9505 :| -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html