Re: [PATCH v5 5/5] KVM: PPC: e500: MMU API

Johannes Weiner <hannes@xxxxxxxxxxx> · Tue, 19 Jul 2011 10:36:36 +0200

On Mon, Jul 18, 2011 at 11:44:02PM +0200, Alexander Graf wrote:
> 
> On 18.07.2011, at 20:08, Scott Wood wrote:
> 
> > On Mon, 18 Jul 2011 18:33:58 +0200
> > Alexander Graf <agraf@xxxxxxx> wrote:
> > 
> >> 
> >> On 18.07.2011, at 18:18, Scott Wood wrote:
> >> 
> >>> They're pinned by get_user_pages_fast().  We (potentially) write to them, so
> >>> we should mark them dirty, because they are dirty.  It's up to the rest
> >>> of Linux what to do with that.  Will being pinned stop updates from being
> >>> written out if it is file-backed?  And eventually the vm will be destroyed
> >>> (or the tlb reconfigured) and the pages will be unpinned.
> >> 
> >> Hrm. How much overhead do we add to the exit-to-userspace path with this?
> > 
> > Not sure -- probably not too much for anonymous memory, compared to the
> > rest of the cost of a heavyweight exit.  On e500 the tlb array is 4 pages,
> > and each set_page_dirty_lock() will just do a few bit operations.
> 
> Hm, ok.
> 
> > 
> >> I completely agree that we should mark them dirty when closing, but I'm
> >> not fully convinced a "we dirty them so we should declare them dirty at
> >> random times" pays off against possible substantial slowdowns due to the
> >> marking. Keep in mind that this is the MMIO case which isn't _that_ seldom.
> > 
> > If we can convince ourselves nothing bad can happen, fine.  I did it here
> > because this is the point at which the API says the contents of the memory
> > are well-defined.  If it is file-backed, and userspace does a sync on a
> > heavyweight exit, shouldn't the the right thing get written to disk?  Could
> > any other weird things happen?  I'm not familiar enough with that part of
> > the kernel to say right away that it's safe.
> 
> I'm neither, these are pretty subtile grounds. CC'ing Andrea and
> Johannes. Guys, would you please take a look at that patch and tell
> us if it's safe and a good thing to do what's being done here?
> 
> We're talking about the following patch: http://www.spinics.net/lists/kvm-ppc/msg02961.html
> and specifically about:
> 
> > +void kvmppc_core_heavy_exit(struct kvm_vcpu *vcpu)
> > +{
> > +	struct kvmppc_vcpu_e500 *vcpu_e500 = to_e500(vcpu);
> > +	int i;
> > +
> > +	/*
> > +	 * We may have modified the guest TLB, so mark it dirty.
> > +	 * We only do it on an actual return to userspace, to avoid
> > +	 * adding more overhead to getting scheduled out -- and avoid
> > +	 * any locking issues with getting preempted in the middle of
> > +	 * KVM_CONFIG_TLB, etc.
> > +	 */
> > +
> > +	for (i = 0; i < vcpu_e500->num_shared_tlb_pages; i++)
> > +		set_page_dirty_lock(vcpu_e500->shared_tlb_pages[i]);
> >  }
>  
> The background is that we want to have a few shared pages between
> kernel and user space to keep guest TLB information in. It's too
> much to shove around every time we need it in user space.

Is there a strict requirement to have these pages originate from
userspace?  Usually, shared memory between kernel and userspace is
owned by the driver and kept away from the mm subsystem completely.

You could allocate the memory in the driver when userspace issues an
ioctl like KVM_ALLOC_TLB_CONFIG and return a file handle that can be
mmap'd.  The array length info is maintained in the vma for the
kernel, userspace must remember the size of mmap regions anyway.
--
To unsubscribe from this list: send the line "unsubscribe kvm-ppc" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html