Re: Extending libvirt to probe NUMA topology

"Daniel P. Berrange" <berrange@xxxxxxxxxx> · Wed, 13 Jun 2007 20:03:24 +0100

On Wed, Jun 13, 2007 at 01:48:21PM -0400, Daniel Veillard wrote:
> On Wed, Jun 13, 2007 at 10:40:40AM -0500, Ryan Harper wrote:
> > Hello all,
> 
>   Hello Ryan,
> 
> > I wanted to start a discussion on how we might get libvirt to be able to
> > probe the NUMA topology of Xen and Linux (for QEMU/KVM).  In Xen, I've
> > recently posted patches for exporting topology into the [1]physinfo
> > hypercall, as well adding a [2]hypercall to probe the Xen heap.  I
> > believe the topology and memory info is already available in Linux.
> > With these, we have enough information to be able to write some simple
> > policy above libvirt that can create guests in a NUMA-aware fashion.
> > 
> > I'd like to suggest the following for discussion:
> > 
> > (1) A function to discover topology
> > (2) A function to check available memory
> > (3) Specifying which cpus to use prior to domain start
> 
>  Okay, but let's start by defining the scope a bit. Historically NUMA
> have explored various paths, and I assume we are gonna work in a rather
> small subset of what NUMA (Non Uniform Memory Access) have meant over time.
> 
>  I assume the following, tell me if I'm wrong:
>    - we are just considering memory and processor affinity
>    - the topology, i.e. the affinity between the processors and the various
>      memory areas is fixed and the kind of mapping is rather simple
> 
> to get into more specifics:
>    - we will need to expand the model of libvirt http://libvirt.org/intro.html
>      to split the Node ressources into separate sets containing processors
>      and memory areas which are highly connected together (assuming the 
>      model is a simple partition of the ressources between the equivalent 
>      of sub-Nodes)
>    - the function (2) would for a given processor tell how much of its memory
>      is already allocated (to existing running or paused domains)
> 
> 
> Right ? Is the partition model sufficient for the architectures ?
> If yes then we will need a new definition and terminology for those sub-Nodes.

We have 3 core models we should refer to when deciding how to present 
things.

 - Linux/Solaris Xen - hypercalls
 - Linux non-Xen - libnuma  
 - Solaris non-Xen -  liblgrp

The Xen & Linux modelling seems reasonably similar IIRC, but Solaris is
a slightly different representational approach. 

> For 3 we already have support for pinning the domain virtual CPUs to physical
> CPUs but I guess it's not sufficient because you want this to be activated
> from the definition of the domain:
> 
>   http://libvirt.org/html/libvirt-libvirt.html#virDomainPinVcpu
>   
> So the XML format would have to be extended to allow specifying the subset
> of processors the domain is supposed to start on:

Yeah, I've previously argued against including VCPU pinning information
in the XML since its a tunable, not a hardware description. Reluctantly
though we'll have to add this VCPU info, since its an absolute requirement
for this info to be provided at time of domain creation for NUMA support.

>   http://libvirt.org/format.html
> 
> I would assume that if nothing is specified, the underlying Hypervisor
> (in libvirt terminology, that could be a linux kernel in practice) will
> by default try to do the optimal placement by itself, i.e. (3) is only 
> useful if you want to override the default behaviour.

Yes that is correct. We should not change the default - let the OS appply
whatever policy it sees fit by default, since over time OS are tending
towards being able to automagically determine & apply NUMA policy.

Dan
-- 
|=- Red Hat, Engineering, Emerging Technologies, Boston.  +1 978 392 2496 -=|
|=-           Perl modules: http://search.cpan.org/~danberr/              -=|
|=-               Projects: http://freshmeat.net/~danielpb/               -=|
|=-  GnuPG: 7D3B9505   F3C9 553F A1DA 4AC2 5648 23C1 B3DF F742 7D3B 9505  -=|