The VFS always uses coarse-grained timestamps when updating the ctime and mtime after a change. This has the benefit of allowing filesystems to optimize away a lot metadata updates, down to around 1 per jiffy, even when a file is under heavy writes. Unfortunately, this coarseness has always been an issue when we're exporting via NFSv3, which relies on timestamps to validate caches. A lot of changes can happen in a jiffy, so timestamps aren't sufficient to help the client decide to invalidate the cache. Even with NFSv4, a lot of exported filesystems don't properly support a change attribute and are subject to the same problems with timestamp granularity. Other applications have similar issues with timestamps (e.g backup applications). If we were to always use fine-grained timestamps, that would improve the situation, but that becomes rather expensive, as the underlying filesystem would have to log a lot more metadata updates. What we need is a way to only use fine-grained timestamps when they are being actively queried. The idea is to use an unused bit in the ctime's tv_nsec field to mark when the mtime or ctime has been queried via getattr. Once that has been marked, the next m/ctime update will use a fine-grained timestamp. The original merge of multigrain timestamps for v6.6 had to be reverted, as a file with a coarse-grained timestamp could incorrectly appear to be modified before a file with a fine-grained timestamp, when that wasn't the case. This revision solves that problem by making it so that when a fine-grained timespec64 is handed out, that that value becomes the floor for further coarse-grained timespec64 fetches. This requires new timekeeper interfaces with a potential downside: when a file is stamped with a fine-grained timestamp, it has to (briefly) take the global timekeeper spinlock. Because of that, this set takes greater pains to avoid issuing new fine-grained timestamps when possible. A fine-grained timestamp is now only required if the current mtime or ctime have been fetched for a getattr, and the next coarse-grained tick has not happened yet. For any other case, a coarse-grained timestamp is fine, and that is done using the seqcount. In order to get some hard numbers about how often the lock would be taken, I've added a couple of percpu counters and a debugfs file for tracking both types of multigrain timekeeper fetches. With this, I did a kdevops fstests run on xfs (CRC mode). I ran "make fstests-baseline" and then immediately grabbed the counter values, and calcuated the percentage: $ time make fstests-baseline real 324m17.337s user 27m23.213s sys 2m40.313s fine 3059498 coarse 383848171 pct fine .79075661 Next I did a kdevops fstests run with NFS. One server serving 3 clients (v4.2, v4.0 and v3). Again, timed "make fstests-baseline" and then grabbed the multigrain counters from the NFS server: $ time make fstests-baseline real 181m57.585s user 16m8.266s sys 1m45.864s fine 8137657 coarse 44726007 pct fine 15.393668 We can't run as many tests on nfs as xfs, so the run is shorter. nfsd is a very getattr-heavy workload, and the clients aggressively coalesce writes, so this is probably something of a pessimal case for number of fine-grained timestamps over time. At this point I'm mainly wondering whether (briefly) taking the timekeeper spinlock in this codepath is unreasonable. It does very little work under it, so I'm hoping the impact would be unmeasurable for most workloads. Side Q: what's the best tool for measuring spinlock contention? It'd be interesting to see how often (and how long) we end up spinning on this lock under different workloads. Note that some of the patches in the series are virtually identical to the ones before. I stripped the prior Reviewed-by/Acked-by tags though since the underlying infrastructure has changed a bit. Comments and suggestions welcome. Signed-off-by: Jeff Layton <jlayton@xxxxxxxxxx> --- Jeff Layton (9): fs: switch timespec64 fields in inode to discrete integers timekeeping: new interfaces for multigrain timestamp handing timekeeping: add new debugfs file to count multigrain timestamps fs: add infrastructure for multigrain timestamps fs: have setattr_copy handle multigrain timestamps appropriately xfs: switch to multigrain timestamps ext4: switch to multigrain timestamps btrfs: convert to multigrain timestamps tmpfs: add support for multigrain timestamps fs/attr.c | 52 ++++++++++++++-- fs/btrfs/file.c | 25 ++------ fs/btrfs/super.c | 5 +- fs/ext4/super.c | 2 +- fs/inode.c | 70 ++++++++++++++++++++- fs/stat.c | 41 ++++++++++++- fs/xfs/libxfs/xfs_trans_inode.c | 6 +- fs/xfs/xfs_iops.c | 10 +-- fs/xfs/xfs_super.c | 2 +- include/linux/fs.h | 85 ++++++++++++++++++-------- include/linux/timekeeper_internal.h | 2 + include/linux/timekeeping.h | 4 ++ kernel/time/timekeeping.c | 117 ++++++++++++++++++++++++++++++++++++ mm/shmem.c | 2 +- 14 files changed, 352 insertions(+), 71 deletions(-) --- base-commit: 12cd44023651666bd44baa36a5c999698890debb change-id: 20231016-mgtime-fe3ea75c6f59 Best regards, -- Jeff Layton <jlayton@xxxxxxxxxx>