On 4/14/23 9:36?AM, Darrick J. Wong wrote: > On Thu, Apr 13, 2023 at 10:11:28PM -0700, Christoph Hellwig wrote: >> On Thu, Apr 13, 2023 at 09:40:29AM +0200, Miklos Szeredi wrote: >>> fuse_direct_write_iter(): >>> >>> bool exclusive_lock = >>> !(ff->open_flags & FOPEN_PARALLEL_DIRECT_WRITES) || >>> iocb->ki_flags & IOCB_APPEND || >>> fuse_direct_write_extending_i_size(iocb, from); >>> >>> If the write is size extending, then it will take the lock exclusive. >>> OTOH, I guess that it would be unusual for lots of size extending >>> writes to be done in parallel. >>> >>> What would be the effect of giving the FMODE_DIO_PARALLEL_WRITE hint >>> and then still serializing the writes? >> >> I have no idea how this flags work, but XFS also takes i_rwsem >> exclusively for appends, when the positions and size aren't aligned to >> the block size, and a few other cases. > > IIUC uring wants to avoid the situation where someone sends 300 writes > to the same file, all of which end up in background workers, and all of > which then contend on exclusive i_rwsem. Hence it has some hashing > scheme that executes io requests serially if they hash to the same value > (which iirc is the inode number?) to prevent resource waste. > > This flag turns off that hashing behavior on the assumption that each of > those 300 writes won't serialize on the other 299 writes, hence it's ok > to start up 300 workers. > > (apologies for precoffee garbled response) Yep, that is pretty much it. If all writes to that inode are serialized by a lock on the fs side, then we'll get a lot of contention on that mutex. And since, originally, nothing supported async writes, everything would get punted to the io-wq workers. io_uring added per-inode hashing for this, so that any punt to io-wq of a write would get serialized. IOW, it's an efficiency thing, not a correctness thing. -- Jens Axboe