Re: [PATCH 1/6] mm: Allow per-VMA locks on file-backed VMAs

Suren Baghdasaryan <surenb@xxxxxxxxxx> · Fri, 7 Apr 2023 15:49:06 -0700

On Fri, Apr 7, 2023 at 3:40 PM Suren Baghdasaryan <surenb@xxxxxxxxxx> wrote:
>
> On Fri, Apr 7, 2023 at 2:36 PM Matthew Wilcox <willy@xxxxxxxxxxxxx> wrote:
> >
> > On Fri, Apr 07, 2023 at 01:26:08PM -0700, Suren Baghdasaryan wrote:
> > > True. do_swap_page() has the same issue. Can we move these
> > > count_vm_event() calls to the end of handle_mm_fault():
> >
> > I was going to suggest something similar, so count that as an
> > Acked-by.  This will change the accounting for the current retry
> > situation, where we drop the mmap_lock in filemap_fault(), initiate I/O
> > and return RETRY.  I think that's probably a good change though; I don't
> > see why applications should have their fault counters incremented twice
> > for that situation.
> >
> > >        mm_account_fault(regs, address, flags, ret);
> > > +out:
> > > +       if (ret != VM_FAULT_RETRY) {
> > > +              count_vm_event(PGFAULT);
> > > +              count_memcg_event_mm(vma->vm_mm, PGFAULT);
> > > +       }
> >
> > Nit: this is a bitmask, so we should probably have:
> >
> >         if (!(ret & VM_FAULT_RETRY)) {
> >
> > in case somebody's ORed it with VM_FAULT_MAJOR or something.
> >
> > Hm, I wonder if we're handling that correctly; if we need to start I/O,
> > do we preserve VM_FAULT_MAJOR returned from the first attempt?  Not
> > in a position where I can look at the code right now.
>
> Interesting question. IIUC mm_account_fault() is the place where
> VM_FAULT_MAJOR is used to perform minor/major fault accounting.
> However it has an early exit:
>
>         /*
>          * We don't do accounting for some specific faults:
>          *
>          * - Unsuccessful faults (e.g. when the address wasn't valid).  That
>          *   includes arch_vma_access_permitted() failing before reaching here.
>          *   So this is not a "this many hardware page faults" counter.  We
>          *   should use the hw profiling for that.
>          *
>          * - Incomplete faults (VM_FAULT_RETRY).  They will only be counted
>          *   once they're completed.
>          */
>         if (ret & (VM_FAULT_ERROR | VM_FAULT_RETRY))
>                 return;
>
> So, VM_FAULT_RETRY is ignored in the accounting path.
> For the current do_swap_page (the only user returning VM_FAULT_RETRY
> this late) it's fine because we did not really do anything. However
> with your series and with my current attempts to drop the vma lock, do
> swapin_readahead() and return VM_FAULT_RETRY the situation changes. We
> need to register such (VM_FAULT_MAJOR | VM_FAULT_RETRY) case as a
> major fault, otherwise after retry it will be accounted as only a
> minor fault. Accounting the retry as a minor fault after the original
> major fault is technically also double-counting but probably not as
> big of a problem as missing the major fault accounting.

Actually, looks like in most cases where VM_FAULT_MAJOR is returned we
also do count_vm_event(PGMAJFAULT) and
count_memcg_event_mm(vma->vm_mm, PGMAJFAULT). mm_account_fault()
modifies only current->{maj|min}_flt and perf_sw_event().