On Wed, Jun 13, 2007 at 01:48:21PM -0400, Daniel Veillard wrote: > On Wed, Jun 13, 2007 at 10:40:40AM -0500, Ryan Harper wrote: > > Hello all, > > Hello Ryan, > > > I wanted to start a discussion on how we might get libvirt to be able to > > probe the NUMA topology of Xen and Linux (for QEMU/KVM). In Xen, I've > > recently posted patches for exporting topology into the [1]physinfo > > hypercall, as well adding a [2]hypercall to probe the Xen heap. I > > believe the topology and memory info is already available in Linux. > > With these, we have enough information to be able to write some simple > > policy above libvirt that can create guests in a NUMA-aware fashion. > > > > I'd like to suggest the following for discussion: > > > > (1) A function to discover topology > > (2) A function to check available memory > > (3) Specifying which cpus to use prior to domain start > > Okay, but let's start by defining the scope a bit. Historically NUMA > have explored various paths, and I assume we are gonna work in a rather > small subset of what NUMA (Non Uniform Memory Access) have meant over time. > > I assume the following, tell me if I'm wrong: > - we are just considering memory and processor affinity > - the topology, i.e. the affinity between the processors and the various > memory areas is fixed and the kind of mapping is rather simple > > to get into more specifics: > - we will need to expand the model of libvirt http://libvirt.org/intro.html > to split the Node ressources into separate sets containing processors > and memory areas which are highly connected together (assuming the > model is a simple partition of the ressources between the equivalent > of sub-Nodes) > - the function (2) would for a given processor tell how much of its memory > is already allocated (to existing running or paused domains) > > > Right ? Is the partition model sufficient for the architectures ? > If yes then we will need a new definition and terminology for those sub-Nodes. We have 3 core models we should refer to when deciding how to present things. - Linux/Solaris Xen - hypercalls - Linux non-Xen - libnuma - Solaris non-Xen - liblgrp The Xen & Linux modelling seems reasonably similar IIRC, but Solaris is a slightly different representational approach. > For 3 we already have support for pinning the domain virtual CPUs to physical > CPUs but I guess it's not sufficient because you want this to be activated > from the definition of the domain: > > http://libvirt.org/html/libvirt-libvirt.html#virDomainPinVcpu > > So the XML format would have to be extended to allow specifying the subset > of processors the domain is supposed to start on: Yeah, I've previously argued against including VCPU pinning information in the XML since its a tunable, not a hardware description. Reluctantly though we'll have to add this VCPU info, since its an absolute requirement for this info to be provided at time of domain creation for NUMA support. > http://libvirt.org/format.html > > I would assume that if nothing is specified, the underlying Hypervisor > (in libvirt terminology, that could be a linux kernel in practice) will > by default try to do the optimal placement by itself, i.e. (3) is only > useful if you want to override the default behaviour. Yes that is correct. We should not change the default - let the OS appply whatever policy it sees fit by default, since over time OS are tending towards being able to automagically determine & apply NUMA policy. Dan -- |=- Red Hat, Engineering, Emerging Technologies, Boston. +1 978 392 2496 -=| |=- Perl modules: http://search.cpan.org/~danberr/ -=| |=- Projects: http://freshmeat.net/~danielpb/ -=| |=- GnuPG: 7D3B9505 F3C9 553F A1DA 4AC2 5648 23C1 B3DF F742 7D3B 9505 -=|