Re: [Qemu-devel] [RFC PATCH] Exporting Guest RAM information for NUMA binding

Andrea Arcangeli <aarcange@xxxxxxxxxx> · Thu, 1 Dec 2011 18:36:23 +0100

On Thu, Dec 01, 2011 at 10:55:20PM +0530, Dipankar Sarma wrote:
> On Wed, Nov 30, 2011 at 06:41:13PM +0100, Andrea Arcangeli wrote:
> > On Wed, Nov 30, 2011 at 09:52:37PM +0530, Dipankar Sarma wrote:
> > > create the guest topology correctly and optimize for NUMA. This
> > > would work for us.
> > 
> > Even on the case of 1 guest that fits in one node, you're not going to
> > max out the full bandwidth of all memory channels with this.
> > 
> > qemu all can do with ms_mbind/tbind is to create a vtopology that
> > matches the hardware topology. It has these limits:
> > 
> > 1) requires all userland applications to be modified to scan either
> >    the physical topology if run on host, or the vtopology if run on
> >    guest to get the full benefit.
> 
> Not sure why you would need that. qemu can reflect the
> topology based on -numa specifications and the corresponding
> ms_tbind/mbind in FDT (in the case of Power, I guess ACPI
> tables for x86) and guest kernel would detect this virtualized
> topology. So there is no need for two types of topologies afaics.
> It will all be reflected in /sys/devices/system/node in the guest.

The point is: what a vtopology gives you? If you don't modify all apps
running in the guest to use it? vtopology on guest, helps exactly like
the topology on host -> very little unless you modify qemu on host to
use ms_tbind/mbind.

> > 2) breaks across live migration if host physical topology changes
> 
> That is indeed an issue. Either VM placement software needs to
> be really smart to migrate VMs that fit well or, more likely,
> we will have to find a way to make guest kernels aware of
> topology changes. But the latter has impact on userspace
> as well for applications that might have optimized for NUMA.

Making guest kernel aware about "memory" topology changes is going to
be a whole mess. Or at least harder than memory hotplug.

> I agree. Specifying NUMA topology for guest can result in
> sub-optimal performance in some cases, it is a tradeoff.

I see it more like a limit of this solution, which is a common limit
to the hard bindings than a tradeoff.

> Agreed.

Yep I just wanted to make clear the limits remains with this solution.

I'll try to teach knumad to detect thread<->memory affinity too with
some logic, we'll see how well that can work.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html