On 04/30/2012 07:40 AM, Paul Mackerras wrote: > On Sun, Apr 29, 2012 at 04:37:33PM +0300, Avi Kivity wrote: > > > How difficult is it to have the kernel resize the HPT on demand? > > Quite difficult, unfortunately. The guest kernel knows the size of > the HPT, and the paravirt interface for updating it relies on the > guest knowing it, since it is used in the hash function (the computed > hash is taken modulo the HPT size). > > And even if it were possible to notify the guest that the size was > changing, since it is a hash table, changing the size requires > traversing the table to move hash entries to their new locations. > When reducing the size one only has to traverse the part that is going > away, but even that will be at least half of the table since the size > is always a power of 2. I'm no x86 fan but I'm glad we have nothing like that over there. > > > Guest > > size is meaningless in the presence of memory hotplug, and having > > unprivileged userspace pin down large amounts of kernel memory us > > undesirable. > > I agree. The HPT is certainly not ideal. However, it's what we have > to deal with on POWER hardware. > > One idea I had is to reserve some contiguous physical memory at boot > time, say a couple of percent of system memory, and use that as a pool > to allocate HPTs from. That would limit the impact on the rest of the > system and also make it more likely that we can find the necessary > amount of physically contiguous memory. Doesn't that limit the number of guests that can run? > > On x86 we grow and shrink the mmu resources in response to guest demand > > and host memory pressure. We can do this because the data structures > > are not authoritative (don't know it that's the case for ppc) and > > because they can be grown incrementally (pretty sure that isn't the case > > on ppc). Still, if we can do this at KVM_SET_USER_MEMORY_REGION time > > instead of a separate ioctl, I think it's better. > > It's not practical to grow the HPT after the guest has started > booting. It is possible to have two HPTs: one that the guest sees, > which can be in pageable memory, and another shadow HPT that the > hardware uses, which has to be in physically contiguous memory. In > this model the size of the shadow HPT can be changed at will, at the > expense of having to reestablish the entries in it, though that can be > done on demand. I have avoided that approach until now because it > uses more memory and is slower than just having a single HPT. This is similar to x86 in the pre npt/ept days, it's indeed slow. I guess we'll be stuck with the pv hash until you get nested lookups (at least a nested hash lookup is just 3 accesses instead of 24). How are limits managed? Won't a user creating a thousand guests with a 16MB hash each bring a server to its knees? -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html