Patch "io_uring/rw: fix missing NOWAIT check for O_DIRECT start write" has been added to the 5.15-stable tree

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



This is a note to let you know that I've just added the patch titled

    io_uring/rw: fix missing NOWAIT check for O_DIRECT start write

to the 5.15-stable tree which can be found at:
    http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=summary

The filename of the patch is:
     io_uring-rw-fix-missing-nowait-check-for-o_direct-st.patch
and it can be found in the queue-5.15 subdirectory.

If you, or anyone else, feels it should not be added to the stable tree,
please let <stable@xxxxxxxxxxxxxxx> know about it.



commit e4520932bc363f59716cf9a4454255fe86a95a68
Author: Jens Axboe <axboe@xxxxxxxxx>
Date:   Thu Oct 31 08:05:44 2024 -0600

    io_uring/rw: fix missing NOWAIT check for O_DIRECT start write
    
    Commit 1d60d74e852647255bd8e76f5a22dc42531e4389 upstream.
    
    When io_uring starts a write, it'll call kiocb_start_write() to bump the
    super block rwsem, preventing any freezes from happening while that
    write is in-flight. The freeze side will grab that rwsem for writing,
    excluding any new writers from happening and waiting for existing writes
    to finish. But io_uring unconditionally uses kiocb_start_write(), which
    will block if someone is currently attempting to freeze the mount point.
    This causes a deadlock where freeze is waiting for previous writes to
    complete, but the previous writes cannot complete, as the task that is
    supposed to complete them is blocked waiting on starting a new write.
    This results in the following stuck trace showing that dependency with
    the write blocked starting a new write:
    
    task:fio             state:D stack:0     pid:886   tgid:886   ppid:876
    Call trace:
     __switch_to+0x1d8/0x348
     __schedule+0x8e8/0x2248
     schedule+0x110/0x3f0
     percpu_rwsem_wait+0x1e8/0x3f8
     __percpu_down_read+0xe8/0x500
     io_write+0xbb8/0xff8
     io_issue_sqe+0x10c/0x1020
     io_submit_sqes+0x614/0x2110
     __arm64_sys_io_uring_enter+0x524/0x1038
     invoke_syscall+0x74/0x268
     el0_svc_common.constprop.0+0x160/0x238
     do_el0_svc+0x44/0x60
     el0_svc+0x44/0xb0
     el0t_64_sync_handler+0x118/0x128
     el0t_64_sync+0x168/0x170
    INFO: task fsfreeze:7364 blocked for more than 15 seconds.
          Not tainted 6.12.0-rc5-00063-g76aaf945701c #7963
    
    with the attempting freezer stuck trying to grab the rwsem:
    
    task:fsfreeze        state:D stack:0     pid:7364  tgid:7364  ppid:995
    Call trace:
     __switch_to+0x1d8/0x348
     __schedule+0x8e8/0x2248
     schedule+0x110/0x3f0
     percpu_down_write+0x2b0/0x680
     freeze_super+0x248/0x8a8
     do_vfs_ioctl+0x149c/0x1b18
     __arm64_sys_ioctl+0xd0/0x1a0
     invoke_syscall+0x74/0x268
     el0_svc_common.constprop.0+0x160/0x238
     do_el0_svc+0x44/0x60
     el0_svc+0x44/0xb0
     el0t_64_sync_handler+0x118/0x128
     el0t_64_sync+0x168/0x170
    
    Fix this by having the io_uring side honor IOCB_NOWAIT, and only attempt a
    blocking grab of the super block rwsem if it isn't set. For normal issue
    where IOCB_NOWAIT would always be set, this returns -EAGAIN which will
    have io_uring core issue a blocking attempt of the write. That will in
    turn also get completions run, ensuring forward progress.
    
    Since freezing requires CAP_SYS_ADMIN in the first place, this isn't
    something that can be triggered by a regular user.
    
    Cc: stable@xxxxxxxxxxxxxxx # 5.10+
    Reported-by: Peter Mann <peter.mann@xxxxx>
    Link: https://lore.kernel.org/io-uring/38c94aec-81c9-4f62-b44e-1d87f5597644@xxxxx
    Signed-off-by: Jens Axboe <axboe@xxxxxxxxx>
    Signed-off-by: Sasha Levin <sashal@xxxxxxxxxx>

diff --git a/io_uring/io_uring.c b/io_uring/io_uring.c
index 718ab6a418425..b53099b595cc7 100644
--- a/io_uring/io_uring.c
+++ b/io_uring/io_uring.c
@@ -3724,6 +3724,25 @@ static int io_write_prep(struct io_kiocb *req, const struct io_uring_sqe *sqe)
 	return io_prep_rw(req, sqe, WRITE);
 }
 
+static bool io_kiocb_start_write(struct io_kiocb *req, struct kiocb *kiocb)
+{
+	struct inode *inode;
+	bool ret;
+
+	if (!(req->flags & REQ_F_ISREG))
+		return true;
+	if (!(kiocb->ki_flags & IOCB_NOWAIT)) {
+		kiocb_start_write(kiocb);
+		return true;
+	}
+
+	inode = file_inode(kiocb->ki_filp);
+	ret = sb_start_write_trylock(inode->i_sb);
+	if (ret)
+		__sb_writers_release(inode->i_sb, SB_FREEZE_WRITE);
+	return ret;
+}
+
 static int io_write(struct io_kiocb *req, unsigned int issue_flags)
 {
 	struct iovec inline_vecs[UIO_FASTIOV], *iovec = inline_vecs;
@@ -3770,8 +3789,8 @@ static int io_write(struct io_kiocb *req, unsigned int issue_flags)
 	if (unlikely(ret))
 		goto out_free;
 
-	if (req->flags & REQ_F_ISREG)
-		kiocb_start_write(kiocb);
+	if (unlikely(!io_kiocb_start_write(req, kiocb)))
+		goto copy_iov;
 	kiocb->ki_flags |= IOCB_WRITE;
 
 	if (req->file->f_op->write_iter)




[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Index of Archives]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux