On Wed 20-09-23 10:12:03, Jeff Layton wrote: > On Wed, 2023-09-20 at 14:48 +0200, Jan Kara wrote: > > On Wed 20-09-23 06:35:18, Jeff Layton wrote: > > > On Wed, 2023-09-20 at 12:17 +0200, Jan Kara wrote: > > > > If I were a sysadmin, I'd rather opt for something like > > > > finegrained timestamps + lazytime (if I needed the finegrained timestamps > > > > functionality). That should avoid the IO overhead of finegrained timestamps > > > > as well and I'd know I can have problems with timestamps only after a > > > > system crash. > > > > > > > I've just got another idea how we could solve the problem: Couldn't we > > > > always just report coarsegrained timestamp to userspace and provide access > > > > to finegrained value only to NFS which should know what it's doing? > > > > > > > > > > I think that'd be hard. First of all, where would we store the second > > > timestamp? We can't just truncate the fine-grained ones to come up with > > > a coarse-grained one. It might also be confusing having nfsd and local > > > filesystems present different attributes. > > > > So what I had in mind (and I definitely miss all the NFS intricacies so the > > idea may be bogus) was that inode->i_ctime would be maintained exactly as > > is now. There will be new (kernel internal at least for now) STATX flag > > STATX_MULTIGRAIN_TS. fill_mg_cmtime() will return timestamp truncated to > > sb->s_time_gran unless STATX_MULTIGRAIN_TS is set. Hence unless you set > > STATX_MULTIGRAIN_TS, there is no difference in the returned timestamps > > compared to the state before multigrain timestamps were introduced. With > > STATX_MULTIGRAIN_TS we return full precision timestamp as stored in the > > inode. Then NFS in fh_fill_pre_attrs() and fh_fill_post_attrs() needs to > > make sure STATX_MULTIGRAIN_TS is set when calling vfs_getattr() to get > > multigrain time. > > > I agree nfsd may now be presenting slightly different timestamps than user > > is able to see with stat(2) directly on the filesystem. But is that a > > problem? Essentially it is a similar solution as the mgtime mount option > > but now sysadmin doesn't have to decide on filesystem mount how to report > > timestamps but the stat caller knowingly opts into possibly inconsistent > > (among files) but high precision timestamps. And in the particular NFS > > usecase where stat is called all the time anyway, timestamps will likely > > even be consistent among files. > > > > I like this idea... > > Would we also need to raise sb->s_time_gran to something corresponding > to HZ on these filesystems? I was actually confused a bit about how timestamp_truncate() works. The jiffie granularity is just direct consequence of current_time() using ktime_get_coarse_real_ts64() and not of timestamp_truncate(). sb->s_time_gran seems to be more about the on-disk format so it doesn't seem like a great idea to touch it. So probably we can just truncate timestamps in generic_fillattr() to HZ granularity unconditionally. > If we truncate the timestamps at a granularity corresponding to HZ before > presenting them via statx and the like then that should work around the > problem with programs that compare timestamps between inodes. Exactly. > With NFSv4, when a filesystem doesn't report a STATX_CHANGE_COOKIE, nfsd > will fake one up using the ctime. It's fine for that to use a full fine- > grained timestamp since we don't expect to be able to compare that value > with one of a different inode. Yes. > I think we'd want nfsd to present the mtime/ctime values as truncated, > just like we would with a local fs. We could hit the same problem of an > earlier-looking timestamp with NFS if we try to present the actual fine- > grained values to the clients. IOW, I'm convinced that we need to avoid > this behavior in most situations. I wasn't sure if there's a way to do this within NFS - i.e., if the value communicated via NFSv3 protocol (I know v4 has a special change cookie field for it) that gets used for detecting need to revalidate file contents isn't the one presented to client's userspace as ctime. If there's a way to do this then great, I'm all for presenting truncated timestamps even for NFS. > If we do this, then we technically don't need the mount option either. Yes, that was my hope. > We could still add it though, and have it govern whether fill_mg_cmtime > truncates the timestamps before storing them in the kstat. Well, if we decide these timestamps are useful for userspace as well, I'd rather make that a userspace visible STATX flag than a mount option. So applications aware of the pitfalls can get high precision timestamps without possibly breaking unaware applications. Honza -- Jan Kara <jack@xxxxxxxx> SUSE Labs, CR