Re: [PATCH 3/5] KVM: PPC: Book3S HV: Handle memory slot deletion and modification correctly

Paul Mackerras <paulus@xxxxxxxxx> · Fri, 10 Aug 2012 10:34:39 +1000

On Thu, Aug 09, 2012 at 03:16:12PM -0300, Marcelo Tosatti wrote:

> The !memslot->npages case is handled in __kvm_set_memory_region
> (please read that part, before kvm_arch_prepare_memory_region() call).
> 
> kvm_arch_flush_shadow should be implemented.

Book3S HV doesn't have shadow page tables per se, rather the hardware
page table is under the control of the hypervisor (i.e. KVM), and
entries are added and removed by the guest using hypercalls.  On
recent machines (POWER7) the hypervisor can choose whether or not to
have the hardware PTE point to a real page of memory; if it doesn't,
access by the guest will trap to the hypervisor.  On older machines
(PPC970) we don't have that flexibility, and we have to provide a real
page of memory (i.e. RAM or I/O) behind every hardware PTE.  (This is
because PPC970 provides no way for page faults in the guest to go to
the hypervisor.)

I could implement kvm_arch_flush_shadow to remove the backing pages
behind every hardware PTE, but that would be very slow and inefficient
on POWER7, and would break the guest on PPC970, particularly in the
case where userspace is removing a small memory slot containing some
I/O device and leaving the memory slot for system RAM untouched.

So the reason for unmapping the hardware PTEs in
kvm_arch_prepare_memory_region rather than kvm_arch_flush_shadow is
that that way we know which memslot is going away.

What exactly are the semantics of kvm_arch_flush_shadow?  I presume
that on x86 with NPT/EPT it basically does nothing - is that right?

> > +	if (old->npages) {
> > +		/* modifying guest_phys or flags */
> > +		if (old->base_gfn != memslot->base_gfn)
> > +			kvmppc_unmap_memslot(kvm, old);
> 
> This case is also handled generically by the last kvm_arch_flush_shadow
> call in __kvm_set_memory_region.

Again, to use this we would need to know which memslot we're
flushing.  If we could change __kvm_set_memory_region to pass the
memslot for these kvm_arch_flush_shadow calls, then I could do as you
suggest.  (Though I would need to think carefully about what would
happen with guest invalidations of hardware PTEs in the interval
between the rcu_assign_pointer(kvm->memslots, slots) and the
kvm_arch_flush_shadow, and whether the invalidation would find the
correct location in the rmap array, given that we have updated the
base_gfn in the memslot without first getting rid of any references to
those pages in the hardware page table.)

> > +		if (memslot->dirty_bitmap &&
> > +		    old->dirty_bitmap != memslot->dirty_bitmap)
> > +			kvmppc_hv_get_dirty_log(kvm, old);
> > +		return 0;
> > +	}
> 
> Better clear dirty log unconditionally on kvm_arch_commit_memory_region,
> similarly to x86 (just so its consistent).

OK.

> > --- a/arch/powerpc/kvm/book3s_hv_rm_mmu.c
> > +++ b/arch/powerpc/kvm/book3s_hv_rm_mmu.c
> > @@ -81,7 +81,7 @@ static void remove_revmap_chain(struct kvm *kvm, long pte_index,
> >  	ptel = rev->guest_rpte |= rcbits;
> >  	gfn = hpte_rpn(ptel, hpte_page_size(hpte_v, ptel));
> >  	memslot = __gfn_to_memslot(kvm_memslots(kvm), gfn);
> > -	if (!memslot || (memslot->flags & KVM_MEMSLOT_INVALID))
> > +	if (!memslot)
> >  		return;
> 
> Why remove this check? (i don't know why it was there in the first
> place, just checking).

This is where we are removing the page backing a hardware PTE and thus
removing the hardware PTE from the reverse-mapping list for the page.
We want to be able to do that properly even if the memslot is in the
process of going away.  I had the flags check in there originally
because other places that used a memslot had that check, but when I
read __kvm_set_memory_region more carefully I realized that the
KVM_MEMSLOT_INVALID flag indicates that we should not create any more
references to pages in the memslot, but we do still need to be able to
handle references going away, i.e. pages in the memslot getting
unmapped.

Paul.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html