Re: [regression, v6.0-rc0, io-uring?] filesystem freeze hangs on sb_wait_write()

Jens Axboe <axboe@xxxxxxxxx> · Tue, 11 Oct 2022 08:18:21 -0600

On 10/10/22 8:54 PM, Jens Axboe wrote:
> On 10/10/22 8:10 PM, Pavel Begunkov wrote:
>> On 10/11/22 03:01, Jens Axboe wrote:
>>> On 10/10/22 7:10 PM, Pavel Begunkov wrote:
>>>> On 10/11/22 01:40, Dave Chinner wrote:
>>>> [...]
>>>>> I note that there are changes to the the io_uring IO path and write
>>>>> IO end accounting in the io_uring stack that was merged, and there
>>>>> was no doubt about the success/failure of the reproducer at each
>>>>> step. Hence I think the bisect is good, and the problem is someone
>>>>> in the io-uring changes.
>>>>>
>>>>> Jens, over to you.
>>>>>
>>>>> The reproducer - generic/068 - is 100% reliable here, io_uring is
>>>>> being exercised by fsstress in the background whilst the filesystem
>>>>> is being frozen and thawed repeatedly. Some path in the io-uring
>>>>> code has an unbalanced sb_start_write()/sb_end_write() pair by the
>>>>> look of it....
>>>>
>>>> A quick guess, it's probably
>>>>
>>>> b000145e99078 ("io_uring/rw: defer fsnotify calls to task context")
>>>>
>>>> ?From a quick look, it removes? kiocb_end_write() -> sb_end_write()
>>>> from kiocb_done(), which is a kind of buffered rw completion path.
>>>
>>> Yeah, I'll take a look.
>>> Didn't get the original email, only Pavel's reply?
>>
>> Forwarded.
> 
> Looks like the email did get delivered, it just ended up in the
> fsdevel inbox.

Nope, it was marked as spam by gmail...

>> Not tested, but should be sth like below. Apart of obvious cases
>> like __io_complete_rw_common() we should also keep in mind
>> when we don't complete the request but ask for reissue with
>> REQ_F_REISSUE, that's for the first hunk
> 
> Can we move this into a helper?

Something like this? Not super happy with it, but...

diff --git a/io_uring/rw.c b/io_uring/rw.c
index 453e0ae92160..1c8d00f9af9f 100644
--- a/io_uring/rw.c
+++ b/io_uring/rw.c
@@ -234,11 +234,32 @@ static void kiocb_end_write(struct io_kiocb *req)
 	}
 }
 
+/*
+ * Trigger the notifications after having done some IO, and finish the write
+ * accounting, if any.
+ */
+static void io_req_io_end(struct io_kiocb *req)
+{
+	struct io_rw *rw = io_kiocb_to_cmd(req, struct io_rw);
+
+	if (rw->kiocb.ki_flags & IOCB_WRITE) {
+		kiocb_end_write(req);
+		fsnotify_modify(req->file);
+	} else {
+		fsnotify_access(req->file);
+	}
+}
+
 static bool __io_complete_rw_common(struct io_kiocb *req, long res)
 {
 	if (unlikely(res != req->cqe.res)) {
 		if ((res == -EAGAIN || res == -EOPNOTSUPP) &&
 		    io_rw_should_reissue(req)) {
+			/*
+			 * Reissue will start accounting again, finish the
+			 * current cycle.
+			 */
+			io_req_io_end(req);
 			req->flags |= REQ_F_REISSUE | REQ_F_PARTIAL_IO;
 			return true;
 		}
@@ -264,15 +285,7 @@ static inline int io_fixup_rw_res(struct io_kiocb *req, long res)
 
 static void io_req_rw_complete(struct io_kiocb *req, bool *locked)
 {
-	struct io_rw *rw = io_kiocb_to_cmd(req, struct io_rw);
-
-	if (rw->kiocb.ki_flags & IOCB_WRITE) {
-		kiocb_end_write(req);
-		fsnotify_modify(req->file);
-	} else {
-		fsnotify_access(req->file);
-	}
-
+	io_req_io_end(req);
 	io_req_task_complete(req, locked);
 }
 
@@ -317,6 +330,7 @@ static int kiocb_done(struct io_kiocb *req, ssize_t ret,
 		req->file->f_pos = rw->kiocb.ki_pos;
 	if (ret >= 0 && (rw->kiocb.ki_complete == io_complete_rw)) {
 		if (!__io_complete_rw_common(req, ret)) {
+			io_req_io_end(req);
 			io_req_set_res(req, final_ret,
 				       io_put_kbuf(req, issue_flags));
 			return IOU_OK;

-- 
Jens Axboe