On Wed 20-09-23 12:30:52, Christian Brauner wrote: > On Wed, Sep 20, 2023 at 12:17:31PM +0200, Jan Kara wrote: > > On Wed 20-09-23 10:41:30, Christian Brauner wrote: > > > > > f1 was last written to *after* f2 was last written to. If the timestamp of f1 > > > > > is then lower than the timestamp of f2, timestamps are fundamentally broken. > > > > > > > > > > Many things in user-space depend on timestamps, such as build system > > > > > centered around 'make', but also 'find ... -newer ...'. > > > > > > > > > > > > > > > > > What does breakage with make look like in this situation? The "fuzz" > > > > here is going to be on the order of a jiffy. The typical case for make > > > > timestamp comparisons is comparing source files vs. a build target. If > > > > those are being written nearly simultaneously, then that could be an > > > > issue, but is that a typical behavior? It seems like it would be hard to > > > > rely on that anyway, esp. given filesystems like NFS that can do lazy > > > > writeback. > > > > > > > > One of the operating principles with this series is that timestamps can > > > > be of varying granularity between different files. Note that Linux > > > > already violates this assumption when you're working across filesystems > > > > of different types. > > > > > > > > As to potential fixes if this is a real problem: > > > > > > > > I don't really want to put this behind a mount or mkfs option (a'la > > > > relatime, etc.), but that is one possibility. > > > > > > > > I wonder if it would be feasible to just advance the coarse-grained > > > > current_time whenever we end up updating a ctime with a fine-grained > > > > timestamp? It might produce some inode write amplification. Files that > > > > > > Less than ideal imho. > > > > > > If this risks breaking existing workloads by enabling it unconditionally > > > and there isn't a clear way to detect and handle these situations > > > without risk of regression then we should move this behind a mount > > > option. > > > > > > So how about the following: > > > > > > From cb14add421967f6e374eb77c36cc4a0526b10d17 Mon Sep 17 00:00:00 2001 > > > From: Christian Brauner <brauner@xxxxxxxxxx> > > > Date: Wed, 20 Sep 2023 10:00:08 +0200 > > > Subject: [PATCH] vfs: move multi-grain timestamps behind a mount option > > > > > > While we initially thought we can do this unconditionally it turns out > > > that this might break existing workloads that rely on timestamps in very > > > specific ways and we always knew this was a possibility. Move > > > multi-grain timestamps behind a vfs mount option. > > > > > > Signed-off-by: Christian Brauner <brauner@xxxxxxxxxx> > > > > Surely this is a safe choice as it moves the responsibility to the sysadmin > > and the cases where finegrained timestamps are required. But I kind of > > wonder how is the sysadmin going to decide whether mgtime is safe for his > > system or not? Because the possible breakage needn't be obvious at the > > first sight... If I were a sysadmin, I'd rather opt for something like > > I think you'll basically enable this because you want to export a > filesystem via NFS. OK, that's what I thought but then you have to make a tough choice between: 1) Possibly inconsistent NFS caches on frequent changes. 2) Possibly broken builds on NFS. Pick your poison ;) > > finegrained timestamps + lazytime (if I needed the finegrained timestamps > > functionality). That should avoid the IO overhead of finegrained timestamps > > That would work with this patch, no? Or are you saying it would need > something else? Sorry, I was not really precise here. What I meant was that instead of having multigrain timestamps, I (as a sysadmin) would want the filesystem to set sb->s_time_gran to 1 ns and use lazytime to remove the IO overhead of the frequent timestamp updates. But that is just me brainstorming possible solutions of the original NFS problem. > > as well and I'd know I can have problems with timestamps only after a > > system crash. > > > > I've just got another idea how we could solve the problem: Couldn't we > > always just report coarsegrained timestamp to userspace and provide access > > to finegrained value only to NFS which should know what it's doing? > > What would changes would be involved for that? See my other email. It should be fairly small... > If this is invasive work and we decide this is something that we want to > do then we should remove FS_MGTIME from btrfs, xfs, ext4, and tmpfs for > v6.6. .. but let's see what Jeff thinks. I can miss some problem with the solution. Honza -- Jan Kara <jack@xxxxxxxx> SUSE Labs, CR