On 10/24/23 8:30 AM, Jens Axboe wrote: > I don't think this is related to the io-wq workers doing non-blocking > IO. The callback is eventually executed by the task that originally > submitted the IO, which is the owner and not the async workers. But... > If that original task is blocked in eg fallocate, then I can see how > that would potentially be an issue. > > I'll take a closer look. I think the best way to fix this is likely to have inode_dio_wait() be interruptible, and return -ERESTARTSYS if it should be restarted. Now the below is obviously not a full patch, but I suspect it'll make ext4 and xfs tick, because they should both be affected. Andres, any chance you can throw this into the testing mix? diff --git a/fs/ext4/extents.c b/fs/ext4/extents.c index 202c76996b62..0d946b6d36fe 100644 --- a/fs/ext4/extents.c +++ b/fs/ext4/extents.c @@ -4747,7 +4747,9 @@ long ext4_fallocate(struct file *file, int mode, loff_t offset, loff_t len) } /* Wait all existing dio workers, newcomers will block on i_rwsem */ - inode_dio_wait(inode); + ret = inode_dio_wait(inode); + if (ret) + goto out; ret = file_modified(file); if (ret) diff --git a/fs/inode.c b/fs/inode.c index 84bc3c76e5cc..c4eca812b16b 100644 --- a/fs/inode.c +++ b/fs/inode.c @@ -2417,17 +2417,24 @@ EXPORT_SYMBOL(inode_owner_or_capable); /* * Direct i/o helper functions */ -static void __inode_dio_wait(struct inode *inode) +static int __inode_dio_wait(struct inode *inode) { wait_queue_head_t *wq = bit_waitqueue(&inode->i_state, __I_DIO_WAKEUP); DEFINE_WAIT_BIT(q, &inode->i_state, __I_DIO_WAKEUP); + int ret = 0; do { - prepare_to_wait(wq, &q.wq_entry, TASK_UNINTERRUPTIBLE); - if (atomic_read(&inode->i_dio_count)) - schedule(); + prepare_to_wait(wq, &q.wq_entry, TASK_INTERRUPTIBLE); + if (!atomic_read(&inode->i_dio_count)) + break; + schedule(); + if (signal_pending(current)) { + ret = -ERESTARTSYS; + break; + } } while (atomic_read(&inode->i_dio_count)); finish_wait(wq, &q.wq_entry); + return ret; } /** @@ -2440,10 +2447,11 @@ static void __inode_dio_wait(struct inode *inode) * Must be called under a lock that serializes taking new references * to i_dio_count, usually by inode->i_mutex. */ -void inode_dio_wait(struct inode *inode) +int inode_dio_wait(struct inode *inode) { if (atomic_read(&inode->i_dio_count)) - __inode_dio_wait(inode); + return __inode_dio_wait(inode); + return 0; } EXPORT_SYMBOL(inode_dio_wait); diff --git a/fs/xfs/xfs_file.c b/fs/xfs/xfs_file.c index 203700278ddb..8ea0c414b173 100644 --- a/fs/xfs/xfs_file.c +++ b/fs/xfs/xfs_file.c @@ -936,7 +936,9 @@ xfs_file_fallocate( * the on disk and in memory inode sizes, and the operations that follow * require the in-memory size to be fully up-to-date. */ - inode_dio_wait(inode); + error = inode_dio_wait(inode); + if (error) + goto out_unlock; /* * Now AIO and DIO has drained we flush and (if necessary) invalidate diff --git a/include/linux/fs.h b/include/linux/fs.h index 4a40823c3c67..7dff3167cb0c 100644 --- a/include/linux/fs.h +++ b/include/linux/fs.h @@ -2971,7 +2971,7 @@ static inline ssize_t blockdev_direct_IO(struct kiocb *iocb, } #endif -void inode_dio_wait(struct inode *inode); +int inode_dio_wait(struct inode *inode); /** * inode_dio_begin - signal start of a direct I/O requests -- Jens Axboe