On Sat, Mar 02, 2024 at 07:43:53AM -0500, Jeff Layton wrote: > On Fri, 2024-03-01 at 18:48 -0800, Darrick J. Wong wrote: > > On Fri, Mar 01, 2024 at 08:31:21AM -0500, Jeff Layton wrote: > > > On Thu, 2024-02-29 at 15:27 -0800, Darrick J. Wong wrote: > > > > On Tue, Feb 27, 2024 at 08:52:58PM +0200, Amir Goldstein wrote: > > > > > On Tue, Feb 27, 2024 at 7:46 PM Darrick J. Wong <djwong@xxxxxxxxxx> wrote: > > > > > > > > > > > > From: Darrick J. Wong <djwong@xxxxxxxxxx> > > > > > > > > > > > > To head off bikeshedding about the fields in xfs_commit_range, let's > > > > > > make it an opaque u64 array and require the userspace program to call > > > > > > a third ioctl to sample the freshness data for us. If we ever converge > > > > > > on a definition for i_version then we can use that; for now we'll just > > > > > > use mtime/ctime like the old swapext ioctl. > > > > > > > > > > This addresses my concerns about using mtime/ctime. > > > > > > > > Oh good! :) > > > > > > > > > I have to say, Darrick, that I think that referring to this concern as > > > > > bikeshedding is not being honest. > > > > > > > > > > I do hate nit picking reviews and I do hate "maybe also fix the world" > > > > > review comments, but I think the question about using mtime/ctime in > > > > > this new API was not out of place > > > > > > > > I agree, your question about mtime/ctime: > > > > > > > > "Maybe a stupid question, but under which circumstances would mtime > > > > change and ctime not change? Why are both needed?" > > > > > > > > was a very good question. But perhaps that statement referred to the > > > > other part of that thread. > > > > > > > > > and I think that making the freshness > > > > > data opaque is better for everyone in the long run and hopefully, this will > > > > > help you move to the things you care about faster. > > > > > > > > I wish you'd suggested an opaque blob that the fs can lay out however it > > > > wants instead of suggesting specifically the change cookie. I'm very > > > > much ok with an opaque freshness blob that allows future flexibility in > > > > how we define the blob's contents. > > > > > > > > I was however very upset about the Jeff's suggestion of using i_version. > > > > I apologize for using all caps in that reply, and snarling about it in > > > > the commit message here. The final version of this patch will not have > > > > that. > > > > > > > > That said, I don't think it is at all helpful to suggest using a file > > > > attribute whose behavior is as yet unresolved. Multigrain timestamps > > > > were a clever idea, regrettably reverted. As far as I could tell when I > > > > wrote my reply, neither had NFS implemented a better behavior and > > > > quietly merged it; nor have Jeff and Dave produced any sort of candidate > > > > patchset to fix all the resulting issues in XFS. > > > > > > > > Reading "I realize that STATX_CHANGE_COOKIE is currently kernel > > > > internal" made me think "OH $deity, they wants me to do that work > > > > too???" > > > > > > > > A better way to have woreded that might've been "How about switching > > > > this to a fs-determined structure so that we can switch the freshness > > > > check to i_version when that's fully working on XFS?" > > > > > > > > The problem I have with reading patch review emails is that I can't > > > > easily tell whether an author's suggestion is being made in a casual > > > > offhand manner? Or if it reflects something they feel strongly needs > > > > change before merging. > > > > > > > > In fairness to you, Amir, I don't know how much you've kept on top of > > > > that i_version vs. XFS discussion. So I have no idea if you were aware > > > > of the status of that work. > > > > > > > > > > Sorry, I didn't mean to trigger anyone, but I do have real concerns > > > about any API that attempts to use timestamps to detect whether > > > something has changed. > > > > > > We learned that lesson in NFS in the 90's. VFS timestamp resolution is > > > just not enough to show whether there was a change to a file -- full > > > stop. > > > > > > I get the hand-wringing over i_version definitions and I don't care to > > > rehash that discussion here, but I'll point out that this is a > > > (proposed) XFS-private interface: > > > > > > What you could do is expose the XFS change counter (the one that gets > > > bumped for everything, even atime updates, possibly via different > > > ioctl), and use that for your "freshness" check. > > > > > > You'd unfortunately get false negative freshness checks after read > > > operations, but you shouldn't get any false positives (which is real > > > danger with timestamps). > > > > I don't see how would that work for this usecase? You have to sample > > file2 before reflinking file2's contents to file1, writing the changes > > to file1, and executing COMMIT_RANGE. Setting the xfs-private REFLINK > > inode flag on file2 will trigger an iversion update even though it won't > > change mtime or ctime. The COMMIT then fails due to the inode flags > > change. > > > > Worse yet, applications aren't going to know if a particular access is > > actually the one that will trigger an atime update. So this will just > > fail unpredictably. > > > > If iversion was purely a write counter then I would switch the freshness > > implementation to use it. But it's not, and I know this to be true > > because I tried that and could not get COMMIT_RANGE to work reliably. > > I suppose the advantage of the blob thing is that we actually /can/ > > switch over whenever it's ready. > > > > Yeah, that's the other part -- you have to be willing to redrive the I/O > every time the freshness check fails, which can get expensive depending > on how active the file is. Again this is an XFS interface, so I don't > really have a dog in this fight. If you think timestamps are good > enough, then so be it. > > All I can do is mention that it has been our experience in the NFS world > that relying on timestamps like this will eventually lead to data > corruption. The race conditions may be tight, and much of the time the > race may be benign, but if you do this enough you'll eventually get > bitten, and end up exchanging data when you shouldn't have. > > All of that said, I think this is great discussion fodder for LSF this > year. I feel like the time is right to consider these sorts of > interfaces that do synchronized I/O without locking. I've already > proposed a discussion around the state of the i_version counter, so > maybe we can chat about it then? Yes. I've gotten an invitation, so corporate approval and dumb injuries notwithstanding, I'll be there this year. :) --D > -- > Jeff Layton <jlayton@xxxxxxxxxx> >