On 24/02/23 10:23AM, Dave Hansen wrote: > On 2/23/24 09:42, John Groves wrote: > > One of the key requirements for famfs is that it service vma faults > > efficiently. Our metadata helps - the search order is n for n extents, > > and n is usually 1. But we can still observe gnarly lock contention > > in mm if PTE faults are happening. This commit introduces fault counters > > that can be enabled and read via /sys/fs/famfs/... > > > > These counters have proved useful in troubleshooting situations where > > PTE faults were happening instead of PMD. No performance impact when > > disabled. > > This seems kinda wonky. Why does _this_ specific filesystem need its > own fault counters. Seems like something we'd want to do much more > generically, if it is needed at all. > > Was the issue here just that vm_ops->fault() was getting called instead > of ->huge_fault()? Or something more subtle? Thanks for your reply Dave! First, I'm willing to pull the fault counters out if the brain trust doesn't like them. I put them in because we were running benchmarks of computational data analytics and and noted that jobs took 3x as long on famfs as raw dax - which indicated I was doing something wrong, because it should be equivalent or very close. The the solution was to call thp_get_unmapped_area() in famfs_file_operations, and performance doesn't vary significantly from raw dax now. Prior to that I wasn't making sure the mmap address was PMD aligned. After that I wanted a way to be double-secret-certain that it was servicing PMD faults as intended. Which it basically always is, so far. (The smoke tests in user space check this.) John