Re: [PATCH] KVM: PPC: Book3S HV: Make the guest MMU hash table size configurable

Paul Mackerras <paulus@xxxxxxxxx> · Thu, 3 May 2012 09:49:21 +1000

On Wed, May 02, 2012 at 02:52:14PM +0200, Alexander Graf wrote:

> >+	/*
> >+	 * If the user wants a different size from default,
> >+	 * try first to allocate it from the kernel page allocator.
> >+	 */
> >+	hpt = 0;
> >+	if (order != kvm_hpt_order) {
> >  		hpt = __get_free_pages(GFP_KERNEL|__GFP_ZERO|__GFP_REPEAT|
> >-				       __GFP_NOWARN, HPT_ORDER - PAGE_SHIFT);
> >+				       __GFP_NOWARN, order - PAGE_SHIFT);
> 
> I like the idea quite a lot, but shouldn't we set a max memory
> threshold through a kernel module option that a user space program
> could pin this way?

The page allocator will hand out at most 16MB with the default setting
of CONFIG_FORCE_MAX_ZONEORDER.  Any larger request will fail, and
userspace can only do one allocation per VM.

Of course, userspace can start hundreds of VMs and use up memory that
way - but that is exactly the same situation as we have already, where
we allocate 16MB per VM at VM creation time, so this patch makes
things no worse in that regard.

> >+long kvmppc_alloc_reset_hpt(struct kvm *kvm, u32 *htab_orderp)
> >+{
> >+	long err = -EBUSY;
> >+	long order;
> >+
> >+	mutex_lock(&kvm->lock);
> >+	if (kvm->arch.rma_setup_done) {
> >+		kvm->arch.rma_setup_done = 0;
> >+		/* order rma_setup_done vs. vcpus_running */
> >+		smp_mb();
> >+		if (atomic_read(&kvm->arch.vcpus_running)) {
> 
> Is this safe? What if we're running one vcpu thread and one hpt
> reset thread. The vcpu thread starts, is just before the vcpu_run
> function, the reset thread checks if anything is running, nothing is
> running, the vcpu thread goes on to do its thing and boom things
> break.

No - in the situation you describe, the vcpu thread will see that
kvm->arch.rma_setup_done is clear and call kvmppc_hv_setup_htab_rma(),
where the first thing it does is mutex_lock(&kvm->lock), and it sits
there until the reset thread has finished and unlocked the
mutex.  The vcpu thread has to see rma_setup_done clear since the
reset thread clears it and then has a barrier, before testing
vcpus_running.

If on the other hand the vcpu thread gets there first and sees that
kvm->arch.rma_setup_done is still set, then the reset thread must see
that kvm->arch.vcpus_running is non-zero, since the vcpu thread first
increments vcpus_running and then tests rma_setup_done, with a barrier
in between.

> But then again, what's the problem with having vcpus running while
> we're clearing the HPT? Stale TLB entries?

If a vcpu is running, it could possibly call kvmppc_h_enter() to set a
HPTE at the exact same time that we are clearing that HPTE, and we
could end up with inconsistent results.  I didn't want to have to add
extra locking in kvmppc_h_enter() to make sure that couldn't happen,
since kvmppc_h_enter() is quite a hot path.  Instead I added a little
bit of overhead on the vcpu entry/exit, since that happens much more
rarely.  In any case, if we're resetting the VM, there shouldn't be
any vcpus running.

Your comment has however reminded me that I need to arrange for a full
TLB flush to occur the first time we run each vcpu after the reset.
I'll do a new version that includes that.

Paul.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html