Re: [PATCH v5 5/5] KVM: PPC: e500: MMU API

Alexander Graf <agraf@xxxxxxx> · Tue, 19 Jul 2011 10:51:40 +0200

On 19.07.2011, at 10:36, Johannes Weiner wrote:

> On Mon, Jul 18, 2011 at 11:44:02PM +0200, Alexander Graf wrote:
>> 
>> On 18.07.2011, at 20:08, Scott Wood wrote:
>> 
>>> On Mon, 18 Jul 2011 18:33:58 +0200
>>> Alexander Graf <agraf@xxxxxxx> wrote:
>>> 
>>>> 
>>>> On 18.07.2011, at 18:18, Scott Wood wrote:
>>>> 
>>>>> They're pinned by get_user_pages_fast().  We (potentially) write to them, so
>>>>> we should mark them dirty, because they are dirty.  It's up to the rest
>>>>> of Linux what to do with that.  Will being pinned stop updates from being
>>>>> written out if it is file-backed?  And eventually the vm will be destroyed
>>>>> (or the tlb reconfigured) and the pages will be unpinned.
>>>> 
>>>> Hrm. How much overhead do we add to the exit-to-userspace path with this?
>>> 
>>> Not sure -- probably not too much for anonymous memory, compared to the
>>> rest of the cost of a heavyweight exit.  On e500 the tlb array is 4 pages,
>>> and each set_page_dirty_lock() will just do a few bit operations.
>> 
>> Hm, ok.
>> 
>>> 
>>>> I completely agree that we should mark them dirty when closing, but I'm
>>>> not fully convinced a "we dirty them so we should declare them dirty at
>>>> random times" pays off against possible substantial slowdowns due to the
>>>> marking. Keep in mind that this is the MMIO case which isn't _that_ seldom.
>>> 
>>> If we can convince ourselves nothing bad can happen, fine.  I did it here
>>> because this is the point at which the API says the contents of the memory
>>> are well-defined.  If it is file-backed, and userspace does a sync on a
>>> heavyweight exit, shouldn't the the right thing get written to disk?  Could
>>> any other weird things happen?  I'm not familiar enough with that part of
>>> the kernel to say right away that it's safe.
>> 
>> I'm neither, these are pretty subtile grounds. CC'ing Andrea and
>> Johannes. Guys, would you please take a look at that patch and tell
>> us if it's safe and a good thing to do what's being done here?
>> 
>> We're talking about the following patch: http://www.spinics.net/lists/kvm-ppc/msg02961.html
>> and specifically about:
>> 
>>> +void kvmppc_core_heavy_exit(struct kvm_vcpu *vcpu)
>>> +{
>>> +	struct kvmppc_vcpu_e500 *vcpu_e500 = to_e500(vcpu);
>>> +	int i;
>>> +
>>> +	/*
>>> +	 * We may have modified the guest TLB, so mark it dirty.
>>> +	 * We only do it on an actual return to userspace, to avoid
>>> +	 * adding more overhead to getting scheduled out -- and avoid
>>> +	 * any locking issues with getting preempted in the middle of
>>> +	 * KVM_CONFIG_TLB, etc.
>>> +	 */
>>> +
>>> +	for (i = 0; i < vcpu_e500->num_shared_tlb_pages; i++)
>>> +		set_page_dirty_lock(vcpu_e500->shared_tlb_pages[i]);
>>> }
>> 
>> The background is that we want to have a few shared pages between
>> kernel and user space to keep guest TLB information in. It's too
>> much to shove around every time we need it in user space.
> 
> Is there a strict requirement to have these pages originate from
> userspace?  Usually, shared memory between kernel and userspace is
> owned by the driver and kept away from the mm subsystem completely.
> 
> You could allocate the memory in the driver when userspace issues an
> ioctl like KVM_ALLOC_TLB_CONFIG and return a file handle that can be
> mmap'd.  The array length info is maintained in the vma for the
> kernel, userspace must remember the size of mmap regions anyway.

Hrm. What's the advantage of doing it this way around vs the other?

Alex

--
To unsubscribe from this list: send the line "unsubscribe kvm-ppc" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html