Re: [PATCH RFC 2/9] timekeeping: new interfaces for multigrain timestamp handing

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Mon, Oct 23, 2023 at 02:18:12PM -1000, Linus Torvalds wrote:
> On Mon, 23 Oct 2023 at 13:26, Dave Chinner <david@xxxxxxxxxxxxx> wrote:
> >
> > The problem is the first read request after a modification has been
> > made. That is causing relatime to see mtime > atime and triggering
> > an atime update. XFS sees this, does an atime update, and in
> > committing that persistent inode metadata update, it calls
> > inode_maybe_inc_iversion(force = false) to check if an iversion
> > update is necessary. The VFS sees I_VERSION_QUERIED, and so it bumps
> > i_version and tells XFS to persist it.
> 
> Could we perhaps just have a mode where we don't increment i_version
> for just atime updates?
>
> Maybe we don't even need a mode, and could just decide that atime
> updates aren't i_version updates at all?

We do that already - in memory atime updates don't bump i_version at
all. The issue is the rare persistent atime update requests that
still happen - they are the ones that trigger an i_version bump on
XFS, and one of the relatime heuristics tickle this specific issue.

If we push the problematic persistent atime updates to be in-memory
updates only, then the whole problem with i_version goes away....

> Yes, yes, it's obviously technically a "inode modification", but does
> anybody actually *want* atime updates with no actual other changes to
> be version events?

Well, yes, there was. That's why we defined i_version in the on disk
format this way well over a decade ago. It was part of some deep
dark magical HSM beans that allowed the application to combine
multiple scans for different inode metadata changes into a single
pass. atime changes was one of the things it needed to know about
for tiering and space scavenging purposes....

> Or maybe i_version can update, but callers of getattr() could have two
> bits for that STATX_CHANGE_COOKIE, one for "I care about atime" and
> one for others, and we'd pass that down to inode_query_version, and
> we'd have a I_VERSION_QUERIED and a I_VERSION_QUERIED_STRICT, and the
> "I care about atime" case ould set the strict one.

This makes correct behaviour reliant on the applicaiton using the
query mechanism correctly. I have my doubts that userspace
developers will be able to understand the subtle difference between
the two options and always choose correctly....

And then there's always the issue that we might end up with both
flags set and we get conflicting bug reports about how atime is not
behaving the way the applications want it to behave.

> Then inode_maybe_inc_iversion() could - for atome updates - skip the
> version update *unless* it sees that I_VERSION_QUERIED_STRICT bit.
> 
> Does that sound sane to people?

I'd much prefer we just do the right thing transparently at the
filesystem level; all we need is for the inode to be flagged that it
should be doing in memory atime updates rather than persistent
updates.

Perhaps the nfs server should just set a new S_LAZYTIME flag on
inodes it accesses similar to how we can set S_NOATIME on inodes to
elide atime updates altogether. Once set, the inode will behave that
way until it is reclaimed from memory....

Cheers,

Dave.
-- 
Dave Chinner
david@xxxxxxxxxxxxx



[Index of Archives]     [Reiser Filesystem Development]     [Ceph FS]     [Kernel Newbies]     [Security]     [Netfilter]     [Bugtraq]     [Linux FS]     [Yosemite National Park]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Samba]     [Device Mapper]     [Linux Media]

  Powered by Linux