On 2/9/24 11:50, Miklos Szeredi wrote: > On Thu, 8 Feb 2024 at 18:09, Amir Goldstein <amir73il@xxxxxxxxx> wrote: > >> static int fuse_inode_get_io_cache(struct fuse_inode *fi) >> { >> + int err = 0; >> + >> assert_spin_locked(&fi->lock); >> - if (fi->iocachectr < 0) >> - return -ETXTBSY; >> - if (fi->iocachectr++ == 0) >> - set_bit(FUSE_I_CACHE_IO_MODE, &fi->state); >> - return 0; >> + /* >> + * Setting the bit advises new direct-io writes to use an exclusive >> + * lock - without it the wait below might be forever. >> + */ >> + set_bit(FUSE_I_CACHE_IO_MODE, &fi->state); >> + while (!err && fuse_is_io_cache_wait(fi)) { >> + spin_unlock(&fi->lock); >> + err = wait_event_killable(fi->direct_io_waitq, >> + !fuse_is_io_cache_wait(fi)); >> + spin_lock(&fi->lock); >> + } >> + /* >> + * Enter caching mode or clear the FUSE_I_CACHE_IO_MODE bit if we >> + * failed to enter caching mode and no other caching open exists. >> + */ >> + if (!err) >> + fi->iocachectr++; >> + else if (fi->iocachectr <= 0) >> + clear_bit(FUSE_I_CACHE_IO_MODE, &fi->state); > > This seems wrong: if the current task is killed, and there's anther > task trying to get cached open mode, then clearing > FUSE_I_CACHE_IO_MODE will allow new parallel writes, breaking this > logic. This is called holding a spin lock, another task cannot enter here? Neither can direct-IO, because it is also locked out. The bit helps DIO code to avoid trying to do parallel DIO without the need to take a spin lock. When DIO decides it wants to do parallel IO, it first has to get past fi->iocachectr < 0 - if there is another task trying to do cache IO, either DIO gets < 0 first and the other cache task has to wait, or cache tasks gets > 0 and dio will continue with the exclusive lock. Or do I miss something? > > I'd suggest fixing this by not making the wait killable. It's just a > nice to have, but without all the other waits being killable (e.g. > filesystem locks) it's just a drop in the ocean. Thanks, Bernd