Re: [PATCH v5 5/5] KVM: PPC: e500: MMU API

Alexander Graf <agraf@xxxxxxx> · Sun, 24 Jul 2011 11:16:32 +0200

On 19.07.2011, at 13:20, Johannes Weiner wrote:

> On Tue, Jul 19, 2011 at 10:51:40AM +0200, Alexander Graf wrote:
>> 
>> On 19.07.2011, at 10:36, Johannes Weiner wrote:
>> 
>>> On Mon, Jul 18, 2011 at 11:44:02PM +0200, Alexander Graf wrote:
>>>> 
>>>> On 18.07.2011, at 20:08, Scott Wood wrote:
>>>> 
>>>>> On Mon, 18 Jul 2011 18:33:58 +0200
>>>>> Alexander Graf <agraf@xxxxxxx> wrote:
>>>>> 
>>>>>> 
>>>>>> On 18.07.2011, at 18:18, Scott Wood wrote:
>>>>>> 
>>>>>>> They're pinned by get_user_pages_fast().  We (potentially) write to them, so
>>>>>>> we should mark them dirty, because they are dirty.  It's up to the rest
>>>>>>> of Linux what to do with that.  Will being pinned stop updates from being
>>>>>>> written out if it is file-backed?  And eventually the vm will be destroyed
>>>>>>> (or the tlb reconfigured) and the pages will be unpinned.
>>>>>> 
>>>>>> Hrm. How much overhead do we add to the exit-to-userspace path with this?
>>>>> 
>>>>> Not sure -- probably not too much for anonymous memory, compared to the
>>>>> rest of the cost of a heavyweight exit.  On e500 the tlb array is 4 pages,
>>>>> and each set_page_dirty_lock() will just do a few bit operations.
>>>> 
>>>> Hm, ok.
>>>> 
>>>>> 
>>>>>> I completely agree that we should mark them dirty when closing, but I'm
>>>>>> not fully convinced a "we dirty them so we should declare them dirty at
>>>>>> random times" pays off against possible substantial slowdowns due to the
>>>>>> marking. Keep in mind that this is the MMIO case which isn't _that_ seldom.
>>>>> 
>>>>> If we can convince ourselves nothing bad can happen, fine.  I did it here
>>>>> because this is the point at which the API says the contents of the memory
>>>>> are well-defined.  If it is file-backed, and userspace does a sync on a
>>>>> heavyweight exit, shouldn't the the right thing get written to disk?  Could
>>>>> any other weird things happen?  I'm not familiar enough with that part of
>>>>> the kernel to say right away that it's safe.
>>>> 
>>>> I'm neither, these are pretty subtile grounds. CC'ing Andrea and
>>>> Johannes. Guys, would you please take a look at that patch and tell
>>>> us if it's safe and a good thing to do what's being done here?
>>>> 
>>>> We're talking about the following patch: http://www.spinics.net/lists/kvm-ppc/msg02961.html
>>>> and specifically about:
>>>> 
>>>>> +void kvmppc_core_heavy_exit(struct kvm_vcpu *vcpu)
>>>>> +{
>>>>> +	struct kvmppc_vcpu_e500 *vcpu_e500 = to_e500(vcpu);
>>>>> +	int i;
>>>>> +
>>>>> +	/*
>>>>> +	 * We may have modified the guest TLB, so mark it dirty.
>>>>> +	 * We only do it on an actual return to userspace, to avoid
>>>>> +	 * adding more overhead to getting scheduled out -- and avoid
>>>>> +	 * any locking issues with getting preempted in the middle of
>>>>> +	 * KVM_CONFIG_TLB, etc.
>>>>> +	 */
>>>>> +
>>>>> +	for (i = 0; i < vcpu_e500->num_shared_tlb_pages; i++)
>>>>> +		set_page_dirty_lock(vcpu_e500->shared_tlb_pages[i]);
>>>>> }
>>>> 
>>>> The background is that we want to have a few shared pages between
>>>> kernel and user space to keep guest TLB information in. It's too
>>>> much to shove around every time we need it in user space.
>>> 
>>> Is there a strict requirement to have these pages originate from
>>> userspace?  Usually, shared memory between kernel and userspace is
>>> owned by the driver and kept away from the mm subsystem completely.
>>> 
>>> You could allocate the memory in the driver when userspace issues an
>>> ioctl like KVM_ALLOC_TLB_CONFIG and return a file handle that can be
>>> mmap'd.  The array length info is maintained in the vma for the
>>> kernel, userspace must remember the size of mmap regions anyway.
>> 
>> Hrm. What's the advantage of doing it this way around vs the other?
> 
> You don't have to work around the mm subsystem trying to reclaim your
> memory, maintain disk coherency that is guaranteed by the filebacked
> memory semantics etc.
> 
> If your driver provides the memory, there are much less assumptions
> from userspace that you have to consider and memory management will
> not interfere either.

Ah, thanks a lot. Scott, mind to switch this to the normal scheme then? Sounds like we don't need to dirty set by then either.

Alex

--
To unsubscribe from this list: send the line "unsubscribe kvm-ppc" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html