On Sat, Jan 18, 2025 at 02:03:41PM +0100, Amir Goldstein wrote: > On Fri, Jan 17, 2025 at 11:19 PM Dave Chinner > <david@xxxxxxxxxxxxx> wrote: > > On Fri, Jan 17, 2025 at 02:27:46PM +0100, Amir Goldstein wrote: > > > On Wed, Jan 15, 2025 at 10:41 PM Dave Chinner > > > <david@xxxxxxxxxxxxx> wrote: For all practical purposes, we > > > could maintain a counter in inode not for submitted DIO, but > > > for files opened O_DIRECT. > > > > > > The first open O_DIRECT could serialize with in-flight > > > buffered writes holding a shared iolock and then buffered > > > writes could take SH vs. EX iolock depending on folio state > > > and on i_dio_open_count. > > > > I don't see how this can be made to work w.r.t. sane data > > coherency behaviour. e.g. how does this model serialise a new > > DIO write that is submitted after a share-locked buffered write > > has just started and passed all the "i_dio_count == 0" checks > > that enable it to use shared locking? i.e. we now have > > potentially overlapping buffered writes and DIO writes being > > done concurrently because the buffered write may not have > > instantiated folios in the page cache yet.... > > > > A shared-locked buffered write can only start when there is no > O_DIRECT file open on the inode. The first open for O_DIRECT to > increment i_dio_open_count needs to take exclusive iolock to wait > for all in-flight buffered writes then release the iolock. > > All DIO submitted via O_DIRECT fds will be safe against in-flight > share-locked buffered write. Hooking ->open() and ->release() isn't sufficient. O_DIRECT can be changed at any time via fcntl(F_SETFL) which isn't synchronised against IO in progress at all. i.e. we can't track that, nor can we even use f_ops->check_flags to reject changes to O_DIRECT state because it doesn't pass either the filp or the inode.... IOWs, as it stands logic like "is the file opened for O_DIRECT" in the IO path is inherently racy and the racing mechanisms are directly under the control of unprivileged userspace applications. Without mods from the VFS down and new hooks being implemented in XFS along with all the whacky "which serialisiation method do we use" logic and dealing with the complexities and switching between them, tracking struct files that are open for O_DIRECT isn't really a viable mechanism for enabling shared buffered writes... -Dave. -- Dave Chinner david@xxxxxxxxxxxxx