Re: [RFC PATCH 0/2] Expose available KVM free memory slot count to help avoid aborts

Avi Kivity <avi@xxxxxxxxxx> · Tue, 25 Jan 2011 16:53:44 +0200

On 01/25/2011 04:41 PM, Alex Williamson wrote:
>  >
>  >
>  >  kvm: Allow memory slot array to grow on demand
>  >
>  >  Remove fixed KVM_MEMORY_SLOTS limit, allowing the slot array
>  >  to grow on demand.  Private slots are now allocated at the
>  >  front instead of the end.  Only x86 seems to use private slots,
>
>  Hmm, doesn't current user space expect slots 8..11 to be the private
>  ones and wouldn't it cause troubles if slots 0..3 are suddenly reserved?

The private slots aren't currently visible to userspace, they're
actually slots 32..35.  The patch automatically increments user passed
slot ids so userspace has it's own zero-based view of the array.
Frankly, I don't understand why userspace reserves slots 8..11, is this
compatibility with older kernel implementations?

I think so.  I believe these kernel versions are too old now to matter, 
but of course I can't be sure.

>  >  so this is now zero for all other archs.  The memslots pointer
>  >  is already updated using rcu, so changing the size off the
>  >  array when it's replaces is straight forward.  x86 also keeps
>  >  a bitmap of slots used by a kvm_mmu_page, which requires a
>  >  shadow tlb flush whenever we increase the number of slots.
>  >  This forces the pages to be rebuilt with the new bitmap size.
>
>  Is it possible for user space to increase the slot number to ridiculous
>  amounts (at least as far as kmalloc allows) and then trigger a kernel
>  walk through them in non-preemptible contexts? Just wondering, I haven't
>  checked all contexts of functions like kvm_is_visible_gfn yet.
>
>  If yes, we should already switch to rbtree or something like that.
>  Otherwise that may wait a bit, but probably not too long.

Yeah, Avi has brought up the hole that userspace can exploit this
interface with these changes.  However, for 99+% of users, this change
leaves the slot array at about the same size, or makes it smaller.  Only
huge, scale-out guests would probably even see a doubling of slots (my
guest with 14 82576 VFs uses 48 slots).  On the kernel side, I think we
can safely save a tree implementation as a later optimization should we
determine it's necessary.  We'll have to see how the userspace side
matches to figure out what's best there.  Thanks,

A tree would probably be a pessimization until we are able to cache the 
result of lookups.  That's because the linear scan generates a very 
simple pattern of branch predictions and memory accesses, while a tree 
uses a whole bunch of cachelines and generates unpredictable branches 
(if the inputs are unpredictable).

Note that with TDP most lookups result in failure, so all we need is a 
fast way to determine whether to perform the lookup at all or not.  That 
can be done by caching the last lookup for this address in the spte by 
setting a reserved bits.  For the other lookups, which we believe will 
succeed, we can assume the probablity of a match is related to the slot 
size, and sort the slots by page count.

--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html