Re: [PATCH 4/5] KVM: PPC: Book3S HV: Don't give the guest RW access to RO pages

Paul Mackerras <paulus@xxxxxxxxx> · Sat, 24 Nov 2012 20:32:37 +1100

On Sat, Nov 24, 2012 at 10:05:37AM +0100, Alexander Graf wrote:
> 
> 
> On 23.11.2012, at 23:13, Paul Mackerras <paulus@xxxxxxxxx> wrote:
> 
> > On Fri, Nov 23, 2012 at 04:47:45PM +0100, Alexander Graf wrote:
> >> 
> >> On 22.11.2012, at 10:28, Paul Mackerras wrote:
> >> 
> >>> Currently, if the guest does an H_PROTECT hcall requesting that the
> >>> permissions on a HPT entry be changed to allow writing, we make the
> >>> requested change even if the page is marked read-only in the host
> >>> Linux page tables.  This is a problem since it would for instance
> >>> allow a guest to modify a page that KSM has decided can be shared
> >>> between multiple guests.
> >>> 
> >>> To fix this, if the new permissions for the page allow writing, we need
> >>> to look up the memslot for the page, work out the host virtual address,
> >>> and look up the Linux page tables to get the PTE for the page.  If that
> >>> PTE is read-only, we reduce the HPTE permissions to read-only.
> >> 
> >> How does KSM handle this usually? If you reduce the permissions to R/O, how do you ever get a R/W page from a deduplicated one?
> > 
> > The scenario goes something like this:
> > 
> > 1. Guest creates an HPTE with RO permissions.
> > 2. KSM decides the page is identical to another page and changes the
> >   HPTE to point to a shared copy.  Permissions are still RO.
> > 3. Guest decides it wants write access to the page and does an
> >   H_PROTECT hcall to change the permissions on the HPTE to RW.
> > 
> > The bug is that we actually make the requested change in step 3.
> > Instead we should leave it at RO, then when the guest tries to write
> > to the page, we take a hypervisor page fault, copy the page and give
> > the guest write access to its own copy of the page.
> > 
> > So what this patch does is add code to H_PROTECT so that if the guest
> > is requesting RW access, we check the Linux PTE to see if the
> > underlying guest page is RO, and if so reduce the permissions in the
> > HPTE to RO.
> 
> But this will be guest visible, because now H_PROTECT doesn't actually mark the page R/W in the HTAB, right?

No - the guest view of the HPTE has R/W permissions.  The guest view
of the HPTE is made up of doubleword 0 from the real HPT plus
rev->guest_rpte for doubleword 1 (where rev is the entry in the revmap
array, kvm->arch.revmap, for the HPTE).  The guest view can be
different from the host/hardware view, which is in the real HPT.  For
instance, the guest view of a HPTE might be valid but the host view
might be invalid because the underlying real page has been paged out -
in that case we use a software bit which we call HPTE_V_ABSENT to
remind ourselves that there is something valid there from the guest's
point of view.  Or the guest view can be R/W but the host view is RO,
as in the case where KSM has merged the page.

> So the flow with this patch is:
> 
>   - guest page permission fault

This comes through the host (kvmppc_hpte_hv_fault()) which looks at
the guest view of the HPTE, sees that it has RO permissions, and sends
the page fault to the guest.

>   - guest does H_PROTECT to mark page r/w
>   - H_PROTECT doesn't do anything
>   - guest returns from permission handler, triggers write fault

This comes once again to kvmppc_hpte_hv_fault(), which sees that the
guest view of the HPTE has R/W permissions now, and sends the page
fault to kvmppc_book3s_hv_page_fault(), which requests write access to
the page, possibly triggering copy-on-write or whatever, and updates
the real HPTE to have R/W permissions and possibly point to a new page
of memory.

> 
> 2 questions here:
> 
> How does the host know that the page is actually r/w?

I assume you mean RO?  It looks up the memslot for the guest physical
address (which it gets from rev->guest_rpte), uses that to work out
the host virtual address (i.e. the address in qemu's address space),
looks up the Linux PTE in qemu's Linux page tables, and looks at the
_PAGE_RW bit there.

> How does this work on 970? I thought page faults always go straight to the guest there.

They do, which is why PPC970 can't do any of this.  On PPC970 we have
kvm->arch.using_mmu_notifiers == 0, and that makes the code pin every
page of guest memory that is mapped by a guest HPTE (with a Linux
guest, that means every page, because of the linear mapping).  On
POWER7 we have kvm->arch.using_mmu_notifiers == 1, which enables
host paging and deduplication of guest memory.

Paul.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html