On Wed, Jun 17, 2020 at 06:14:52PM +0200, Christian Borntraeger wrote: > > > On 17.06.20 18:06, Peter Xu wrote: > > Hi, Christian, > > > > On Wed, Jun 17, 2020 at 08:19:29AM +0200, Christian Borntraeger wrote: > >> > >> > >> On 16.06.20 18:35, Peter Xu wrote: > >>> Hi, Alexander, > >>> > >>> On Tue, Jun 16, 2020 at 05:59:33PM +0200, Alexander Gordeev wrote: > >>>>> @@ -489,21 +489,7 @@ static inline vm_fault_t do_exception(struct pt_regs *regs, int access) > >>>>> if (unlikely(fault & VM_FAULT_ERROR)) > >>>>> goto out_up; > >>>>> > >>>>> - /* > >>>>> - * Major/minor page fault accounting is only done on the > >>>>> - * initial attempt. If we go through a retry, it is extremely > >>>>> - * likely that the page will be found in page cache at that point. > >>>>> - */ > >>>>> if (flags & FAULT_FLAG_ALLOW_RETRY) { > >>>>> - if (fault & VM_FAULT_MAJOR) { > >>>>> - tsk->maj_flt++; > >>>>> - perf_sw_event(PERF_COUNT_SW_PAGE_FAULTS_MAJ, 1, > >>>>> - regs, address); > >>>>> - } else { > >>>>> - tsk->min_flt++; > >>>>> - perf_sw_event(PERF_COUNT_SW_PAGE_FAULTS_MIN, 1, > >>>>> - regs, address); > >>>>> - } > >>>>> if (fault & VM_FAULT_RETRY) { > >>>>> if (IS_ENABLED(CONFIG_PGSTE) && gmap && > >>>>> (flags & FAULT_FLAG_RETRY_NOWAIT)) { > > > > [1] > > > >>>> > >>>> Seems like the call to mm_fault_accounting() will be missed if > >>>> we entered here with FAULT_FLAG_RETRY_NOWAIT flag set, since it > >>>> jumps to "out_up"... > >>> > >>> This is true as a functional change. However that also means that we've got a > >>> VM_FAULT_RETRY, which hints that this fault has been requested to retry rather > >>> than handled correctly (for instance, due to some try_lock failed during the > >>> fault process). > >>> > >>> To me, that case should not be counted as a page fault at all? Or we might get > >>> the same duplicated accounting when the page fault retried from a higher stack. > >>> > >>> Thanks > >> > >> This case below (the one with the gmap) is the KVM case for doing a so called > >> pseudo page fault to our guests. (we notify our guests about major host page > >> faults and let it reschedule to something else instead of halting the vcpu). > >> This is being resolved with either gup or fixup_user_fault asynchronously by > >> KVM code (this can also be sync when the guest does not match some conditions) > >> We do not change the counters in that code as far as I can tell so we should > >> continue to do it here. > >> > >> (see arch/s390/kvm/kvm-s390.c > >> static int vcpu_post_run(struct kvm_vcpu *vcpu, int exit_reason) > >> { > >> [...] > >> } else if (current->thread.gmap_pfault) { > >> trace_kvm_s390_major_guest_pfault(vcpu); > >> current->thread.gmap_pfault = 0; > >> if (kvm_arch_setup_async_pf(vcpu)) > >> return 0; > >> return kvm_arch_fault_in_page(vcpu, current->thread.gmap_addr, 1); > >> } > > > > Please correct me if I'm wrong... but I still think what this patch does is the > > right thing to do. > > > > Note again that IMHO when reached [1] above it means the page fault is not > > handled correctly so we need to fallback to KVM async page fault, then we > > shouldn't increment the accountings until it's finally handled correctly. That > > final accounting should be done in the async pf path in gup code where the page > > fault is handled: > > > > kvm_arch_fault_in_page > > gmap_fault > > fixup_user_fault > > > > Where in fixup_user_fault() we have: > > > > if (tsk) { > > if (major) > > tsk->maj_flt++; > > else > > tsk->min_flt++; > > } > > > > Right that case does work. Its the case where we do not inject a pseudo pagefault and > instead fall back to synchronous fault-in. > What is about the other case: > > kvm_setup_async_pf > ->workqueue > async_pf_execute > get_user_pages_remote > > Does get_user_pages_remote do the accounting as well? I cant see that. > It should, with: get_user_pages_remote __get_user_pages_remote __get_user_pages_locked __get_user_pages faultin_page Where the accounting is done in faultin_page(). We probably need to also move the accounting in faultin_page() to be after the retry check too, but that should be irrelevant to this patch. Thanks! -- Peter Xu