On 2/28/24 2:48 AM, Amir Goldstein wrote: > On Wed, Feb 28, 2024 at 12:42 AM Dave Chinner via Lsf-pc >> Essentially, we need to explicitly give POSIX the big finger and >> state that there are no atomicity guarantees given for write() calls >> of any size, nor are there any guarantees for data coherency for >> any overlapping concurrent buffered IO operations. >> > > I have disabled read vs. write atomicity (out-of-tree) to make xfs behave > as the other fs ever since Jan has added the invalidate_lock and I believe > that Meta kernel has done that way before. Hmmm, you might be thinking of my patch to prevent kswapd from getting stuck on XFS inode reclaim, but I don't think we've ever messed with write concurrency. I'm comfortable with the concurrency change in general, but it's not somewhere I'd be excited about differing from upstream. Total tangent, but we only carry two XFS patches right now that aren't upstream. I dropped the inode reclaim patch; the problem stopped showing up in our profiles, and the impacted workloads changed to rocksdb for other reasons. We flip XFS discards back to synchronous. Async disards without any kind of metering saturate drives when we do bulk deletes, leading to latency spikes on reads and writes. There's probably a class of flash that can handle this, but we don't have it. Unfortunately I also disable large folios on XFS. They are corrupting xarrays on our v5.19 kernel, with large folios from multiple files interleaved together in the same file. We'll try again with them on v6.4 or maybe v6.8, but the repro needs thousands of machines making NFS noises just to trigger one failure, and I won't be able to debug it until I can make a more reasonable reproduction. -chris