Re: [PATCH v3] io_uring: switch cancel_hash to use per entry spinlock

Hao Xu <hao.xu@xxxxxxxxx> · Sat, 11 Jun 2022 01:40:59 +0800

On 6/11/22 00:10, Pavel Begunkov wrote:
On 6/10/22 16:45, Hao Xu wrote:
On 6/10/22 18:21, Pavel Begunkov wrote:
On 6/8/22 12:12, Hao Xu wrote:
From: Hao Xu <howeyxu@xxxxxxxxxxx>

Add a new io_hash_bucket structure so that each bucket in cancel_hash
has separate spinlock. Use per entry lock for cancel_hash, this removes
some completion lock invocation and remove contension between different
cancel_hash entries.

Signed-off-by: Hao Xu <howeyxu@xxxxxxxxxxx>
---

v1->v2:
  - Add per entry lock for poll/apoll task work code which was missed
    in v1
  - add an member in io_kiocb to track req's indice in cancel_hash

v2->v3:
  - make struct io_hash_bucket align with cacheline to avoid cacheline
    false sharing.
  - re-calculate hash value when deleting an entry from cancel_hash.
    (cannot leverage struct io_poll to store the indice since it's
     already 64 Bytes)

  io_uring/cancel.c         | 14 +++++++--
  io_uring/cancel.h         |  6 ++++
  io_uring/fdinfo.c         |  9 ++++--
  io_uring/io_uring.c       |  8 +++--
  io_uring/io_uring_types.h |  2 +-
  io_uring/poll.c           | 64 
+++++++++++++++++++++------------------
  6 files changed, 65 insertions(+), 38 deletions(-)

diff --git a/io_uring/cancel.c b/io_uring/cancel.c
index 83cceb52d82d..bced5d6b9294 100644
--- a/io_uring/cancel.c
+++ b/io_uring/cancel.c
@@ -93,14 +93,14 @@ int io_try_cancel(struct io_kiocb *req, struct 
io_cancel_data *cd)
      if (!ret)
          return 0;
-    spin_lock(&ctx->completion_lock);
      ret = io_poll_cancel(ctx, cd);
      if (ret != -ENOENT)
          goto out;
+    spin_lock(&ctx->completion_lock);
      if (!(cd->flags & IORING_ASYNC_CANCEL_FD))
          ret = io_timeout_cancel(ctx, cd);
-out:
      spin_unlock(&ctx->completion_lock);
+out:
      return ret;
  }
@@ -192,3 +192,13 @@ int io_async_cancel(struct io_kiocb *req, 
unsigned int issue_flags)
      io_req_set_res(req, ret, 0);
      return IOU_OK;
  }
+
+inline void init_hash_table(struct io_hash_bucket *hash_table, 
unsigned size)

Not inline, it can break builds

diff --git a/io_uring/cancel.h b/io_uring/cancel.h
index 4f35d8696325..b57d6706f84d 100644
--- a/io_uring/cancel.h
+++ b/io_uring/cancel.h
@@ -4,3 +4,9 @@ int io_async_cancel_prep(struct io_kiocb *req, const 
struct io_uring_sqe *sqe);
  int io_async_cancel(struct io_kiocb *req, unsigned int issue_flags);
  int io_try_cancel(struct io_kiocb *req, struct io_cancel_data *cd);
+inline void init_hash_table(struct io_hash_bucket *hash_table, 
unsigned size);

And this inline as well

diff --git a/io_uring/poll.c b/io_uring/poll.c
index 0df5eca93b16..515f1727e3c6 100644
--- a/io_uring/poll.c
+++ b/io_uring/poll.c
[...]
  static struct io_kiocb *io_poll_find(struct io_ring_ctx *ctx, bool 
poll_only,
                       struct io_cancel_data *cd)
-    __must_hold(&ctx->completion_lock)
  {
-    struct hlist_head *list;
      struct io_kiocb *req;
-    list = &ctx->cancel_hash[hash_long(cd->data, 
ctx->cancel_hash_bits)];
-    hlist_for_each_entry(req, list, hash_node) {
+    u32 index = hash_long(cd->data, ctx->cancel_hash_bits);
+    struct io_hash_bucket *hb = &ctx->cancel_hash[index];
+
+    spin_lock(&hb->lock);
+    hlist_for_each_entry(req, &hb->list, hash_node) {
          if (cd->data != req->cqe.user_data)
              continue;
          if (poll_only && req->opcode != IORING_OP_POLL_ADD)
@@ -569,47 +577,48 @@ static struct io_kiocb *io_poll_find(struct 
io_ring_ctx *ctx, bool poll_only,
                  continue;
              req->work.cancel_seq = cd->seq;
          }
+        spin_unlock(&hb->lock);

The problem here is that after you unlock, nothing keeps the
request alive. Before it was more like

lock(completion_lock);
req = poll_find();
cancel_poll(req);
unlock(completion_lock);

and was relying on all of this happening under ->completion_lock.
Now following io_poll_disarm() and/or io_poll_cancel_req() race.
Same with io_poll_file_find().

Looks we have to add completion_lock back for cancellation path.

It was relying on completion_lock only because it was guarding
the hashing, so now find+cancel should happen under the per
bucket spins, i.e.

lock(buckets[index].lock);
req = poll_find();
cancel_poll(req);
unlock(buckets[index].lock);

A bit trickier to code but doable.

Ah, seems I misunderstood your words, which I'm clear with now.
Yea, it's a bit odd. I'll think about this issue before taking this
solution tomorrow.
Btw, I saw a req->refcount set for poll_add, seems it is not necessary?
(I haven't check it carefully yet)