Re: KVM and NUMA

"Daniel P. Berrange" <berrange@xxxxxxxxxx> · Thu, 15 Jul 2010 20:31:24 +0100

On Thu, Jul 15, 2010 at 07:10:35PM +0200, Ralf Spenneberg wrote:
> Hi,
> 
> I just had a chance to play with KVM on Ubuntu 10.04 LTS on some new HP
> 360 g6 with Nehalem processors. I have a feeling that KVM and NUMA on
> these machines do not play well together. 
> 
> Doing some benchmarks I got bizarre numbers. Sometimes the VMs were
> performing fine and some times the performance was very bad! Apparently
> KVM does not recognize the NUMA-architecture and places memory and
> process randomly and therefore often on different numa cells.
> 
> First a couple of specs of the machine:
> Two Nehalem sockets with E5520, Hyperthreading turned off, 4 cores per
> socket, all in all 8 processors.
> 
> 
> Linux recognizes the NUMA-architecture:
> # numactl --hardware
> available: 2 nodes (0-1)
> node 0 cpus: 0 2 4 6
> node 0 size: 12277 MB
> node 0 free: 9183 MB
> node 1 cpus: 1 3 5 7
> node 1 size: 12287 MB
> node 1 free: 8533 MB
> node distances:
> node   0   1 
>   0:  10  20 
>   1:  20  10 

If numactl --hardware works, then libvirt should work,
since libvirt uses the numactl library to query topology

> 
> So I have got two cells with 4 cores each.
> 
> Virsh does not recognize the topology:
> # virsh capabilities
> <capabilities>
>   <host>
>     <cpu>
>       <arch>x86_64</arch>
>       <model>core2duo</model>
>       <topology sockets='2' cores='4' threads='1'/>
>       <feature name='lahf_lm'/>
> ..

The NUMA topology does not get put inside the <cpu> element. It 
is one level up in a <topology> element. eg

<capabilities>

  <host>
    <cpu>
      <arch>x86_64</arch>
      ....snip....
    </cpu>
    ...snip...
    <topology>
      <cells num='2'>
        <cell id='0'>
          <cpus num='4'>
            <cpu id='0'/>
            <cpu id='1'/>
            <cpu id='2'/>
            <cpu id='3'/>
          </cpus>
        </cell>
        <cell id='1'>
          <cpus num='4'>
            <cpu id='4'/>
            <cpu id='5'/>
            <cpu id='6'/>
            <cpu id='7'/>
          </cpus>
        </cell>
      </cells>
    </topology>

This shows 2 numa nodes (cells in libvirt terminology) each with
4 CPUs. You can also query free RAM in each node/cell

# virsh freecell 0
0: 1922084 kB
# virsh freecell 1
1: 1035700 kB

>From both of these you can then decide where to place the guest

> I guess this is the fact, because QEMU does not recognize the
> NUMA-Architecture (QEMU-Monitor):
> (qemu) info numa
> 0 nodes

IIRC this is reporting the guest NUMA  topology which is
completely independant of host NUMA topology.

> So apparently KVM does not utilize the NUMA-architecture. Did I do
> something wrong. Is KVM missing a patch? Do I need to activate something
> in KVM to recognize the NUMA-Architecture? 

There are two aspects to NUMA. 1. Placing QEMU on appropriate NUMA
ndes. 2. defining guest NUMA topology

By default QEMU will float freely across any CPUs and all the guest
RAM will appear in one node. This is can be bad for performance,
especially if you are benchmarking

So for performance testing you definitely want to  bind QEMU to the
CPUs within a single NUMA node at startup, this will mean that all
memory accesses are local to the node. Unless you give the guest
more virtual RAM, than there is free RAM on the local NUMA node.
Since you suggest you're using libvirt, the low level way todo 
this is in the guest XML at the <vcpu> element

In my capabilities XML example above you can see 2 numa nodes, 
each with 4 cpus. So if I want to restrict the guest to the
first NUMA node which has CPU numbers 0, 1, 2, 3, then I'd do

  <domain type='kvm' id='8'>
    <name>rhel6x86_64</name>
    <uuid>0bbf8187-bce1-bc77-2a2c-fb033816f7f4</uuid>
    <memory>819200</memory>
    <currentMemory>819200</currentMemory>
    <vcpu cpuset='0-3'>2</vcpu>
    ...snip...

You can verify the pinning with virsh cpuinfo

# virsh vcpuinfo rhel5xen
VCPU:           0
CPU:            1
State:          running
CPU time:       15.9s
CPU Affinity:   yyyy----

VCPU:           1
CPU:            2
State:          running
CPU time:       9.5s
CPU Affinity:   yyyy----
....snip rest...

It is not yet possible to define the guest visible NUMA topology via
libvirt, but that shouldn't be too critical for performance unless
you needed to your guest to be able to span multiple host nodes.

For further performance you also really want to enable hugepages on
your host (eg mount hugetlbfs at /dev/hugepages), then restart 
libvirtd daemon, and then add the following to your guest XML just
after the <memory> element:

  <memoryBacking>
    <hugepages/>
  </memoryBacking>

This will make it pre-allocate hugepages for all guest RAM at startup.
NB the downside is that you can't overcommit RAM, but that's a tradeoff
between maximising utilization and maximising performance.

Daniel
-- 
|: Red Hat, Engineering, London    -o-   http://people.redhat.com/berrange/ :|
|: http://libvirt.org -o- http://virt-manager.org -o- http://deltacloud.org :|
|: http://autobuild.org        -o-         http://search.cpan.org/~danberr/ :|
|: GnuPG: 7D3B9505  -o-   F3C9 553F A1DA 4AC2 5648 23C1 B3DF F742 7D3B 9505 :|
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html