Re: [PATCH v6 0/6] task work optimization

Pavel Begunkov <asml.silence@xxxxxxxxx> · Sun, 5 Dec 2021 15:42:15 +0000

On 12/5/21 15:02, Hao Xu wrote:
在 2021/12/3 下午10:21, Pavel Begunkov 写道:
On 12/3/21 07:30, Hao Xu wrote:
在 2021/12/3 上午10:01, Pavel Begunkov 写道:
On 12/3/21 01:39, Pavel Begunkov wrote:
On 11/26/21 10:07, Hao Xu wrote:
v4->v5
- change the implementation of merge_wq_list

[...]
But testing with liburing tests I'm getting the stuff below,
e.g. cq-overflow hits it every time. Double checked that
I took [RESEND] version of 6/6.

[   30.360370] BUG: scheduling while atomic: cq-overflow/2082/0x00000000
[   30.360520] Call Trace:
[   30.360523]  <TASK>
[   30.360527]  dump_stack_lvl+0x4c/0x63
[   30.360536]  dump_stack+0x10/0x12
[   30.360540]  __schedule_bug.cold+0x50/0x5e
[   30.360545]  __schedule+0x754/0x900
[   30.360551]  ? __io_cqring_overflow_flush+0xb6/0x200
[   30.360558]  schedule+0x55/0xd0
[   30.360563]  schedule_timeout+0xf8/0x140
[   30.360567]  ? prepare_to_wait_exclusive+0x58/0xa0
[   30.360573]  __x64_sys_io_uring_enter+0x69c/0x8e0
[   30.360578]  ? io_rsrc_buf_put+0x30/0x30
[   30.360582]  do_syscall_64+0x3b/0x80
[   30.360588]  entry_SYSCALL_64_after_hwframe+0x44/0xae
[   30.360592] RIP: 0033:0x7f9f9680118d
[   30.360618]  </TASK>
[   30.362295] BUG: scheduling while atomic: cq-overflow/2082/0x7ffffffe
[   30.362396] Call Trace:
[   30.362397]  <TASK>
[   30.362399]  dump_stack_lvl+0x4c/0x63
[   30.362406]  dump_stack+0x10/0x12
[   30.362409]  __schedule_bug.cold+0x50/0x5e
[   30.362413]  __schedule+0x754/0x900
[   30.362419]  schedule+0x55/0xd0
[   30.362423]  schedule_timeout+0xf8/0x140
[   30.362427]  ? prepare_to_wait_exclusive+0x58/0xa0
[   30.362431]  __x64_sys_io_uring_enter+0x69c/0x8e0
[   30.362437]  ? io_rsrc_buf_put+0x30/0x30
[   30.362440]  do_syscall_64+0x3b/0x80
[   30.362445]  entry_SYSCALL_64_after_hwframe+0x44/0xae
[   30.362449] RIP: 0033:0x7f9f9680118d
[   30.362470]  </TASK>
<repeated>

cannot repro this, all the liburing tests work well on my side..

One problem is when on the first iteration tctx_task_work doen't
have anything in prior_task_list, it goes to handle_tw_list(),
which sets up @ctx but leaves @locked=false (say there is
contention). And then on the second iteration it goes to
handle_prior_tw_list() with non-NULL @ctx and @locked=false,
and tries to unlock not locked spin.

Not sure that's the exactly the problem from traces, but at
least a quick hack resetting the ctx at the beginning of
handle_prior_tw_list() heals it.
Good catch, thanks.

note: apart from the quick fix the diff below includes
a couple of lines to force it to go through the new path.

diff --git a/fs/io_uring.c b/fs/io_uring.c
index 66d119ac4424..3868123eef87 100644
--- a/fs/io_uring.c
+++ b/fs/io_uring.c
@@ -2272,6 +2272,9 @@ static inline void ctx_commit_and_unlock(struct io_ring_ctx *ctx)
  static void handle_prior_tw_list(struct io_wq_work_node *node, struct io_ring_ctx **ctx,
                                  bool *locked)
  {
+       ctx_flush_and_put(*ctx, locked);
+       *ctx = NULL;
+
         do {
                 struct io_wq_work_node *next = node->next;
                 struct io_kiocb *req = container_of(node, struct io_kiocb,
@@ -2283,7 +2286,8 @@ static void handle_prior_tw_list(struct io_wq_work_node *node, struct io_ring_ct
                         ctx_flush_and_put(*ctx, locked);
                         *ctx = req->ctx;
                         /* if not contended, grab and improve batching */
-                       *locked = mutex_trylock(&(*ctx)->uring_lock);
+                       *locked = false;
+                       // *locked = mutex_trylock(&(*ctx)->uring_lock);
I believe this one is your debug code which I shouldn't take, should I?

Right, just for debug, helped to catch the issue. FWIW, it doesn't seem
ctx_flush_and_put() is a good solution but was good enough to verify
my assumptions.

                         percpu_ref_get(&(*ctx)->refs);
                         if (unlikely(!*locked))
                                 spin_lock(&(*ctx)->completion_lock);
@@ -2840,7 +2844,7 @@ static void io_complete_rw(struct kiocb *kiocb, long res)
                 return;
         req->result = res;
         req->io_task_work.func = io_req_task_complete;
-       io_req_task_work_add(req, !!(req->ctx->flags & IORING_SETUP_SQPOLL));
+       io_req_task_work_add(req, true);
  }





--
Pavel Begunkov