On Mon, Jun 15, 2020 at 3:16 PM Peter Xu <peterx@xxxxxxxxxx> wrote: > > This series tries to address all of them by introducing mm_fault_accounting() > first, so that we move all the page fault accounting into the common code base, > then call it properly from arch pf handlers just like handle_mm_fault(). Hmm. So having looked at this a bit more, I'd actually like to go even further, and just get rid of the per-architecture code _entirely_. Here's a straw-man patch to the generic code - the idea is mostly laid out in the comment that I'm just quoting here directly too: /* * Do accounting in the common code, to avoid unnecessary * architecture differences or duplicated code. * * We arbitrarily make the rules be: * * - faults that never even got here (because the address * wasn't valid). That includes arch_vma_access_permitted() * failing above. * * So this is expressly not a "this many hardware page * faults" counter. Use the hw profiling for that. * * - incomplete faults (ie RETRY) do not count (see above). * They will only count once completed. * * - the fault counts as a "major" fault when the final * successful fault is VM_FAULT_MAJOR, or if it was a * retry (which implies that we couldn't handle it * immediately previously). * * - if the fault is done for GUP, regs wil be NULL and * no accounting will be done (but you _could_ pass in * your own regs and it would be accounted to the thread * doing the fault, not to the target!) */ the code itself in the patch is (a) pretty trivial and self-evident (b) INCOMPLETE that (b) is worth noting: this patch won't compile on its own. It intentionally leaves all the users without the new 'regs' argument, because you obviously simply need to remove all the code that currently tries to do any accounting. Comments? This is a bigger change, but I think it might be worth it to _really_ consolidate the major/minor logic. One detail worth noting: I do wonder if we should put the perf_sw_event(PERF_COUNT_SW_PAGE_FAULTS, 1, regs, addr); just in the arch code at the top of the fault handling, and consider it entirely unrelated to the major/minor fault handling. The major/minor faults fundamnetally are about successes. But the plain PERF_COUNT_SW_PAGE_FAULTS could be about things that fail, including things that never even get to this point at all. I'm not convinced it's useful to have three SW events that are defined to be A=B+C. But this does *not* do that part. It' sreally just an RFC patch. Linus
Attachment:
patch
Description: Binary data