On Mon, Jan 27, 2025 at 09:15:41PM -0800, Christoph Hellwig wrote: > On Tue, Jan 28, 2025 at 07:49:17AM +1100, Dave Chinner wrote: > > > As for why an exclusive lock is needed for append writes, it's because > > > we don't want the EOF to be modified during the append write. > > > > We don't care if the EOF moves during the append write at the > > filesystem level. We set kiocb->ki_pos = i_size_read() from > > generic_write_checks() under shared locking, and if we then race > > with another extending append write there are two cases: > > > > 1. the other task has already extended i_size; or > > 2. we have two IOs at the same offset (i.e. at i_size). > > > > In either case, we don't need exclusive locking for the IO because > > the worst thing that happens is that two IOs hit the same file > > offset. IOWs, it has always been left up to the application > > serialise RWF_APPEND writes on XFS, not the filesystem. > > I disagree. O_APPEND (RWF_APPEND is just the Linux-specific > per-I/O version of that) is extensively used for things like > multi-thread loggers where you have multiple threads doing O_APPEND > writes to a single log file, and they expect to not lose data > that way. Sure, but we don't think we need full file offset-scope IO exclusion to solve that problem. We can still safely do concurrent writes within EOF to occur whilst another buffered append write is doing file extension work. IOWs, where we really need to get to is a model that allows concurrent buffered IO at all times, except for the case where IO operations that change the inode size need to serialise against other similar operations (e.g. other EOF extending IOs, truncate, etc). Hence I think we can largely ignore O_APPEND for the purposes of prototyping shared buffered IO and getting rid of the IOLOCK from the XFS IO path. I may end up re-using the i_rwsem as a "EOF modification" serialisation mechanism for O_APPEND and extending writes in general, but I don't think we need a general write IO exclusion mechanism for this... -Dave. -- Dave Chinner david@xxxxxxxxxxxxx