Re: [PATCH 14/13] xfs: make XFS_IOC_COMMIT_RANGE freshness data opaque

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Fri, 2024-03-01 at 18:48 -0800, Darrick J. Wong wrote:
> On Fri, Mar 01, 2024 at 08:31:21AM -0500, Jeff Layton wrote:
> > On Thu, 2024-02-29 at 15:27 -0800, Darrick J. Wong wrote:
> > > On Tue, Feb 27, 2024 at 08:52:58PM +0200, Amir Goldstein wrote:
> > > > On Tue, Feb 27, 2024 at 7:46 PM Darrick J. Wong <djwong@xxxxxxxxxx> wrote:
> > > > > 
> > > > > From: Darrick J. Wong <djwong@xxxxxxxxxx>
> > > > > 
> > > > > To head off bikeshedding about the fields in xfs_commit_range, let's
> > > > > make it an opaque u64 array and require the userspace program to call
> > > > > a third ioctl to sample the freshness data for us.  If we ever converge
> > > > > on a definition for i_version then we can use that; for now we'll just
> > > > > use mtime/ctime like the old swapext ioctl.
> > > > 
> > > > This addresses my concerns about using mtime/ctime.
> > > 
> > > Oh good! :)
> > > 
> > > > I have to say, Darrick, that I think that referring to this concern as
> > > > bikeshedding is not being honest.
> > > > 
> > > > I do hate nit picking reviews and I do hate "maybe also fix the world"
> > > > review comments, but I think the question about using mtime/ctime in
> > > > this new API was not out of place
> > > 
> > > I agree, your question about mtime/ctime:
> > > 
> > > "Maybe a stupid question, but under which circumstances would mtime
> > > change and ctime not change? Why are both needed?"
> > > 
> > > was a very good question.  But perhaps that statement referred to the
> > > other part of that thread.
> > > 
> > > >                                   and I think that making the freshness
> > > > data opaque is better for everyone in the long run and hopefully, this will
> > > > help you move to the things you care about faster.
> > > 
> > > I wish you'd suggested an opaque blob that the fs can lay out however it
> > > wants instead of suggesting specifically the change cookie.  I'm very
> > > much ok with an opaque freshness blob that allows future flexibility in
> > > how we define the blob's contents.
> > > 
> > > I was however very upset about the Jeff's suggestion of using i_version.
> > > I apologize for using all caps in that reply, and snarling about it in
> > > the commit message here.  The final version of this patch will not have
> > > that.
> > > 
> > > That said, I don't think it is at all helpful to suggest using a file
> > > attribute whose behavior is as yet unresolved.  Multigrain timestamps
> > > were a clever idea, regrettably reverted.  As far as I could tell when I
> > > wrote my reply, neither had NFS implemented a better behavior and
> > > quietly merged it; nor have Jeff and Dave produced any sort of candidate
> > > patchset to fix all the resulting issues in XFS.
> > > 
> > > Reading "I realize that STATX_CHANGE_COOKIE is currently kernel
> > > internal" made me think "OH $deity, they wants me to do that work
> > > too???"
> > > 
> > > A better way to have woreded that might've been "How about switching
> > > this to a fs-determined structure so that we can switch the freshness
> > > check to i_version when that's fully working on XFS?"
> > > 
> > > The problem I have with reading patch review emails is that I can't
> > > easily tell whether an author's suggestion is being made in a casual
> > > offhand manner?  Or if it reflects something they feel strongly needs
> > > change before merging.
> > > 
> > > In fairness to you, Amir, I don't know how much you've kept on top of
> > > that i_version vs. XFS discussion.  So I have no idea if you were aware
> > > of the status of that work.
> > > 
> > 
> > Sorry, I didn't mean to trigger anyone, but I do have real concerns
> > about any API that attempts to use timestamps to detect whether
> > something has changed.
> > 
> > We learned that lesson in NFS in the 90's. VFS timestamp resolution is
> > just not enough to show whether there was a change to a file -- full
> > stop.
> > 
> > I get the hand-wringing over i_version definitions and I don't care to
> > rehash that discussion here, but I'll point out that this is a
> > (proposed) XFS-private interface:
> > 
> > What you could do is expose the XFS change counter (the one that gets
> > bumped for everything, even atime updates, possibly via different
> > ioctl), and use that for your "freshness" check.
> > 
> > You'd unfortunately get false negative freshness checks after read
> > operations, but you shouldn't get any false positives (which is real
> > danger with timestamps).
> 
> I don't see how would that work for this usecase?  You have to sample
> file2 before reflinking file2's contents to file1, writing the changes
> to file1, and executing COMMIT_RANGE.  Setting the xfs-private REFLINK
> inode flag on file2 will trigger an iversion update even though it won't
> change mtime or ctime.  The COMMIT then fails due to the inode flags
> change.
> 
> Worse yet, applications aren't going to know if a particular access is
> actually the one that will trigger an atime update.  So this will just
> fail unpredictably.
> 
> If iversion was purely a write counter then I would switch the freshness
> implementation to use it.  But it's not, and I know this to be true
> because I tried that and could not get COMMIT_RANGE to work reliably.
> I suppose the advantage of the blob thing is that we actually /can/
> switch over whenever it's ready.
> 

Yeah, that's the other part -- you have to be willing to redrive the I/O
every time the freshness check fails, which can get expensive depending
on how active the file is. Again this is an XFS interface, so I don't
really have a dog in this fight. If you think timestamps are good
enough, then so be it.

All I can do is mention that it has been our experience in the NFS world
that relying on timestamps like this will eventually lead to data
corruption. The race conditions may be tight, and much of the time the
race may be benign, but if you do this enough you'll eventually get
bitten, and end up exchanging data when you shouldn't have.

All of that said, I think this is great discussion fodder for LSF this
year. I feel like the time is right to consider these sorts of
interfaces that do synchronized I/O without locking. I've already
proposed a discussion around the state of the i_version counter, so
maybe we can chat about it then?
-- 
Jeff Layton <jlayton@xxxxxxxxxx>





[Index of Archives]     [Linux Ext4 Filesystem]     [Union Filesystem]     [Filesystem Testing]     [Ceph Users]     [Ecryptfs]     [NTFS 3]     [AutoFS]     [Kernel Newbies]     [Share Photos]     [Security]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux Cachefs]     [Reiser Filesystem]     [Linux RAID]     [NTFS 3]     [Samba]     [Device Mapper]     [CEPH Development]

  Powered by Linux