Dave Hansen wrote: > On 2/23/24 12:39, John Groves wrote: > >> We had similar unit test regression concerns with fsdax where some > >> upstream change silently broke PMD faults. The solution there was trace > >> points in the fault handlers and a basic test that knows apriori that it > >> *should* be triggering a certain number of huge faults: > >> > >> https://github.com/pmem/ndctl/blob/main/test/dax.sh#L31 > > Good approach, thanks Dan! My working assumption is that we'll be able to make > > that approach work in the famfs tests. So the fault counters should go away > > in the next version. > > I do really suspect there's something more generic that should be done > here. Maybe we need a generic 'huge_faults' perf event to pair up with > the good ol' faults that we already have: > > # perf stat -e faults /bin/ls > > Performance counter stats for '/bin/ls': > > 104 faults > > > 0.001499862 seconds time elapsed > > 0.001490000 seconds user > 0.000000000 seconds sys Certainly something like that would have satisified this sanity test use case. I will note that mm_account_fault() would need some help to figure out the size of the page table entry that got installed. Maybe extensions to vm_fault_reason to add VM_FAULT_P*D? That compliments VM_FAULT_FALLBACK to indicate whether, for example, the fallback went from PUD to PMD, or all the way back to PTE. Then use cases like this could just add a dynamic probe in mm_account_fault(). No real need for a new tracepoint unless there was a use case for this outside of regression testing fault handlers, right?