Re: [PATCH] KVM: PPC: Book3S HV: Make the guest MMU hash table size configurable

Avi Kivity <avi@xxxxxxxxxx> · Mon, 30 Apr 2012 11:31:42 +0300

On 04/30/2012 07:40 AM, Paul Mackerras wrote:
> On Sun, Apr 29, 2012 at 04:37:33PM +0300, Avi Kivity wrote:
>
> > How difficult is it to have the kernel resize the HPT on demand?
>
> Quite difficult, unfortunately.  The guest kernel knows the size of
> the HPT, and the paravirt interface for updating it relies on the
> guest knowing it, since it is used in the hash function (the computed
> hash is taken modulo the HPT size).
>
> And even if it were possible to notify the guest that the size was
> changing, since it is a hash table, changing the size requires
> traversing the table to move hash entries to their new locations.
> When reducing the size one only has to traverse the part that is going
> away, but even that will be at least half of the table since the size
> is always a power of 2.

I'm no x86 fan but I'm glad we have nothing like that over there.

>
> >  Guest
> > size is meaningless in the presence of memory hotplug, and having
> > unprivileged userspace pin down large amounts of kernel memory us
> > undesirable.
>
> I agree.  The HPT is certainly not ideal.  However, it's what we have
> to deal with on POWER hardware.
>
> One idea I had is to reserve some contiguous physical memory at boot
> time, say a couple of percent of system memory, and use that as a pool
> to allocate HPTs from.  That would limit the impact on the rest of the
> system and also make it more likely that we can find the necessary
> amount of physically contiguous memory.

Doesn't that limit the number of guests that can run?

> > On x86 we grow and shrink the mmu resources in response to guest demand
> > and host memory pressure.  We can do this because the data structures
> > are not authoritative (don't know it that's the case for ppc) and
> > because they can be grown incrementally (pretty sure that isn't the case
> > on ppc).  Still, if we can do this at KVM_SET_USER_MEMORY_REGION time
> > instead of a separate ioctl, I think it's better.
>
> It's not practical to grow the HPT after the guest has started
> booting.  It is possible to have two HPTs: one that the guest sees,
> which can be in pageable memory, and another shadow HPT that the
> hardware uses, which has to be in physically contiguous memory.  In
> this model the size of the shadow HPT can be changed at will, at the
> expense of having to reestablish the entries in it, though that can be
> done on demand.  I have avoided that approach until now because it
> uses more memory and is slower than just having a single HPT.

This is similar to x86 in the pre npt/ept days, it's indeed slow.  I
guess we'll be stuck with the pv hash until you get nested lookups (at
least a nested hash lookup is just 3 accesses instead of 24).

How are limits managed?  Won't a user creating a thousand guests with a
16MB hash each bring a server to its knees?

-- 
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html