On 19.07.2011, at 13:20, Johannes Weiner wrote: > On Tue, Jul 19, 2011 at 10:51:40AM +0200, Alexander Graf wrote: >> >> On 19.07.2011, at 10:36, Johannes Weiner wrote: >> >>> On Mon, Jul 18, 2011 at 11:44:02PM +0200, Alexander Graf wrote: >>>> >>>> On 18.07.2011, at 20:08, Scott Wood wrote: >>>> >>>>> On Mon, 18 Jul 2011 18:33:58 +0200 >>>>> Alexander Graf <agraf@xxxxxxx> wrote: >>>>> >>>>>> >>>>>> On 18.07.2011, at 18:18, Scott Wood wrote: >>>>>> >>>>>>> They're pinned by get_user_pages_fast(). We (potentially) write to them, so >>>>>>> we should mark them dirty, because they are dirty. It's up to the rest >>>>>>> of Linux what to do with that. Will being pinned stop updates from being >>>>>>> written out if it is file-backed? And eventually the vm will be destroyed >>>>>>> (or the tlb reconfigured) and the pages will be unpinned. >>>>>> >>>>>> Hrm. How much overhead do we add to the exit-to-userspace path with this? >>>>> >>>>> Not sure -- probably not too much for anonymous memory, compared to the >>>>> rest of the cost of a heavyweight exit. On e500 the tlb array is 4 pages, >>>>> and each set_page_dirty_lock() will just do a few bit operations. >>>> >>>> Hm, ok. >>>> >>>>> >>>>>> I completely agree that we should mark them dirty when closing, but I'm >>>>>> not fully convinced a "we dirty them so we should declare them dirty at >>>>>> random times" pays off against possible substantial slowdowns due to the >>>>>> marking. Keep in mind that this is the MMIO case which isn't _that_ seldom. >>>>> >>>>> If we can convince ourselves nothing bad can happen, fine. I did it here >>>>> because this is the point at which the API says the contents of the memory >>>>> are well-defined. If it is file-backed, and userspace does a sync on a >>>>> heavyweight exit, shouldn't the the right thing get written to disk? Could >>>>> any other weird things happen? I'm not familiar enough with that part of >>>>> the kernel to say right away that it's safe. >>>> >>>> I'm neither, these are pretty subtile grounds. CC'ing Andrea and >>>> Johannes. Guys, would you please take a look at that patch and tell >>>> us if it's safe and a good thing to do what's being done here? >>>> >>>> We're talking about the following patch: http://www.spinics.net/lists/kvm-ppc/msg02961.html >>>> and specifically about: >>>> >>>>> +void kvmppc_core_heavy_exit(struct kvm_vcpu *vcpu) >>>>> +{ >>>>> + struct kvmppc_vcpu_e500 *vcpu_e500 = to_e500(vcpu); >>>>> + int i; >>>>> + >>>>> + /* >>>>> + * We may have modified the guest TLB, so mark it dirty. >>>>> + * We only do it on an actual return to userspace, to avoid >>>>> + * adding more overhead to getting scheduled out -- and avoid >>>>> + * any locking issues with getting preempted in the middle of >>>>> + * KVM_CONFIG_TLB, etc. >>>>> + */ >>>>> + >>>>> + for (i = 0; i < vcpu_e500->num_shared_tlb_pages; i++) >>>>> + set_page_dirty_lock(vcpu_e500->shared_tlb_pages[i]); >>>>> } >>>> >>>> The background is that we want to have a few shared pages between >>>> kernel and user space to keep guest TLB information in. It's too >>>> much to shove around every time we need it in user space. >>> >>> Is there a strict requirement to have these pages originate from >>> userspace? Usually, shared memory between kernel and userspace is >>> owned by the driver and kept away from the mm subsystem completely. >>> >>> You could allocate the memory in the driver when userspace issues an >>> ioctl like KVM_ALLOC_TLB_CONFIG and return a file handle that can be >>> mmap'd. The array length info is maintained in the vma for the >>> kernel, userspace must remember the size of mmap regions anyway. >> >> Hrm. What's the advantage of doing it this way around vs the other? > > You don't have to work around the mm subsystem trying to reclaim your > memory, maintain disk coherency that is guaranteed by the filebacked > memory semantics etc. > > If your driver provides the memory, there are much less assumptions > from userspace that you have to consider and memory management will > not interfere either. Ah, thanks a lot. Scott, mind to switch this to the normal scheme then? Sounds like we don't need to dirty set by then either. Alex -- To unsubscribe from this list: send the line "unsubscribe kvm-ppc" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html