Re: [PATCH] KVM: PPC: Book3S HV: Make the guest MMU hash table size configurable

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Mon, Apr 30, 2012 at 11:31:42AM +0300, Avi Kivity wrote:
> On 04/30/2012 07:40 AM, Paul Mackerras wrote:
> > On Sun, Apr 29, 2012 at 04:37:33PM +0300, Avi Kivity wrote:
> >
> > > How difficult is it to have the kernel resize the HPT on demand?
> >
> > Quite difficult, unfortunately.  The guest kernel knows the size of
> > the HPT, and the paravirt interface for updating it relies on the
> > guest knowing it, since it is used in the hash function (the computed
> > hash is taken modulo the HPT size).
> >
> > And even if it were possible to notify the guest that the size was
> > changing, since it is a hash table, changing the size requires
> > traversing the table to move hash entries to their new locations.
> > When reducing the size one only has to traverse the part that is going
> > away, but even that will be at least half of the table since the size
> > is always a power of 2.
> 
> I'm no x86 fan but I'm glad we have nothing like that over there.

:)

> >
> > >  Guest
> > > size is meaningless in the presence of memory hotplug, and having
> > > unprivileged userspace pin down large amounts of kernel memory us
> > > undesirable.
> >
> > I agree.  The HPT is certainly not ideal.  However, it's what we have
> > to deal with on POWER hardware.
> >
> > One idea I had is to reserve some contiguous physical memory at boot
> > time, say a couple of percent of system memory, and use that as a pool
> > to allocate HPTs from.  That would limit the impact on the rest of the
> > system and also make it more likely that we can find the necessary
> > amount of physically contiguous memory.
> 
> Doesn't that limit the number of guests that can run?

It does, but so does the amount of physical memory in the host.  I
believe that with 2% to 3% of the host memory reserved for HPTs, we'll
run out of memory for the guests before we run out of HPTs (even with
KSM).

> > > On x86 we grow and shrink the mmu resources in response to guest demand
> > > and host memory pressure.  We can do this because the data structures
> > > are not authoritative (don't know it that's the case for ppc) and
> > > because they can be grown incrementally (pretty sure that isn't the case
> > > on ppc).  Still, if we can do this at KVM_SET_USER_MEMORY_REGION time
> > > instead of a separate ioctl, I think it's better.
> >
> > It's not practical to grow the HPT after the guest has started
> > booting.  It is possible to have two HPTs: one that the guest sees,
> > which can be in pageable memory, and another shadow HPT that the
> > hardware uses, which has to be in physically contiguous memory.  In
> > this model the size of the shadow HPT can be changed at will, at the
> > expense of having to reestablish the entries in it, though that can be
> > done on demand.  I have avoided that approach until now because it
> > uses more memory and is slower than just having a single HPT.
> 
> This is similar to x86 in the pre npt/ept days, it's indeed slow.  I
> guess we'll be stuck with the pv hash until you get nested lookups (at
> least a nested hash lookup is just 3 accesses instead of 24).

How do you get 24?  Naively I would have thought that with a 4-level
guest page table and a 4-level host page table you would get 16
accesses.  I have seen a research paper that shows that those accesses
can be cached really well, whereas accesses in a hash generally don't
cache well at all.

> How are limits managed?  Won't a user creating a thousand guests with a
> 16MB hash each bring a server to its knees?

Well, that depends on how much memory the server has.  In my
experience the limit seems to be about 300 to 400 guests on a POWER7
with 128GB of RAM; that's with each guest getting 0.5GB of RAM (about
the minimum needed to boot Fedora or RHEL successfully) and using KSM.
Beyond that it gets really short of memory and starts thrashing.  It
seems to be the guest memory that consumes the memory rather than the
HPTs, which are much smaller.  And for a 0.5GB guest, a 1MB HPT is
ample, so 1000 guests then only use up 1GB.  Part of the point of my
patch is to allow userspace to make the HPT be 1MB rather than 16MB
for small guests like these.

Paul.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[Index of Archives]     [KVM ARM]     [KVM ia64]     [KVM ppc]     [Virtualization Tools]     [Spice Development]     [Libvirt]     [Libvirt Users]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite Questions]     [Linux Kernel]     [Linux SCSI]     [XFree86]
  Powered by Linux