Re: [PATCH 1/2] io_uring: fix no lock protection for ctx->cq_extra

Hao Xu <haoxu@xxxxxxxxxxxxxxxxx> · Fri, 26 Nov 2021 11:29:53 +0800

在 2021/11/25 下午10:30, Pavel Begunkov 写道:
On 11/25/21 09:21, Hao Xu wrote:
ctx->cq_extra should be protected by completion lock so that the
req_need_defer() does the right check.

Cc: stable@xxxxxxxxxxxxxxx
Signed-off-by: Hao Xu <haoxu@xxxxxxxxxxxxxxxxx>
---
  fs/io_uring.c | 3 +++
  1 file changed, 3 insertions(+)

diff --git a/fs/io_uring.c b/fs/io_uring.c
index f666a0e7f5e8..ae9534382b26 100644
--- a/fs/io_uring.c
+++ b/fs/io_uring.c
@@ -6537,12 +6537,15 @@ static __cold void io_drain_req(struct 
io_kiocb *req)
      u32 seq = io_get_sequence(req);
      /* Still need defer if there is pending req in defer list. */
+    spin_lock(&ctx->completion_lock);
      if (!req_need_defer(req, seq) && 
list_empty_careful(&ctx->defer_list)) {
+        spin_unlock(&ctx->completion_lock);

I haven't checked the sync assumptions, but it was as this since
the very beginning, so curious if you see any hangs or other
problems?
No, I just go over it in my mind: cq_extra and cached_cq_tail are both
updated in one completion_lock critical section, lacking of lock may
cause wrong values of cq_extra and cached_cq_tail and thus
req_need_defer() return wrong result. For example, req_need_defer() see
the updated cached_cq_tail but has the old cq_extra value. This is
possible since io_rsrc_put_work() runs in system-worker.
The result of lacking of lock is the drain request may be delayed a
little bit more or less time.
There is also a cq_extra-- in io_get_sqe(), which is the hot path, so
I incline to not touch it.

In any case, as drain is pushed to slow path, I'm all for
simplifying synchronisation here.