Re: [External] Re: [PATCH v3 2/3] io_uring: avoid ring quiesce while registering/unregistering eventfd

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 





On 03/02/2022 17:56, Jens Axboe wrote:
On 2/3/22 10:41 AM, Usama Arif wrote:
@@ -1726,13 +1732,24 @@ static inline struct io_uring_cqe *io_get_cqe(struct io_ring_ctx *ctx)
  	return &rings->cqes[tail & mask];
  }
-static inline bool io_should_trigger_evfd(struct io_ring_ctx *ctx)
+static void io_eventfd_signal(struct io_ring_ctx *ctx)
  {
-	if (likely(!ctx->cq_ev_fd))
-		return false;
+	struct io_ev_fd *ev_fd;
+
+	rcu_read_lock();
+	/* rcu_dereference ctx->io_ev_fd once and use it for both for checking and eventfd_signal */
+	ev_fd = rcu_dereference(ctx->io_ev_fd);
+
+	if (likely(!ev_fd))
+		goto out;
  	if (READ_ONCE(ctx->rings->cq_flags) & IORING_CQ_EVENTFD_DISABLED)
-		return false;
-	return !ctx->eventfd_async || io_wq_current_is_worker();
+		goto out;
+
+	if (!ctx->eventfd_async || io_wq_current_is_worker())
+		eventfd_signal(ev_fd->cq_ev_fd, 1);
+
+out:
+	rcu_read_unlock();
  }

Like Pavel pointed out, we still need the fast path (of not having an
event fd registered at all) to just do the cheap check and not need rcu
lock/unlock. Outside of that, I think this looks fine.


Hmm, maybe i didn't understand you and Pavel correctly. Are you suggesting to do the below diff over patch 3? I dont think that would be correct, as it is possible that just after checking if ctx->io_ev_fd is present unregister can be called by another thread and set ctx->io_ev_fd to NULL that would cause a NULL pointer exception later? In the current patch, the check of whether ev_fd exists happens as the first thing after rcu_read_lock and the rcu_read_lock are extremely cheap i believe.

diff --git a/fs/io_uring.c b/fs/io_uring.c
index 25ed86533910..0cf282fba14d 100644
--- a/fs/io_uring.c
+++ b/fs/io_uring.c
@@ -1736,12 +1736,13 @@ static void io_eventfd_signal(struct io_ring_ctx *ctx)
 {
        struct io_ev_fd *ev_fd;

+       if (likely(!ctx->io_ev_fd))
+               return;
+
        rcu_read_lock();
/* rcu_dereference ctx->io_ev_fd once and use it for both for checking and eventfd_signal */
        ev_fd = rcu_dereference(ctx->io_ev_fd);

-       if (likely(!ev_fd))
-               goto out;
        if (READ_ONCE(ctx->rings->cq_flags) & IORING_CQ_EVENTFD_DISABLED)
                goto out;




  static int io_eventfd_unregister(struct io_ring_ctx *ctx)
  {
-	if (ctx->cq_ev_fd) {
-		eventfd_ctx_put(ctx->cq_ev_fd);
-		ctx->cq_ev_fd = NULL;
-		return 0;
+	struct io_ev_fd *ev_fd;
+	int ret;
+
+	mutex_lock(&ctx->ev_fd_lock);
+	ev_fd = rcu_dereference_protected(ctx->io_ev_fd, lockdep_is_held(&ctx->ev_fd_lock));
+	if (!ev_fd) {
+		ret = -ENXIO;
+		goto out;
  	}
+	synchronize_rcu();
+	eventfd_ctx_put(ev_fd->cq_ev_fd);
+	kfree(ev_fd);
+	rcu_assign_pointer(ctx->io_ev_fd, NULL);
+	ret = 0;
- return -ENXIO;
+out:
+	mutex_unlock(&ctx->ev_fd_lock);
+	return ret;
  }

synchronize_rcu() can take a long time, and I think this is in the wrong
spot. It should be on the register side, IFF we need to expedite the
completion of a previous event fd unregistration. If we do it that way,
at least it'll only happen if it's necessary. What do you think?



How about the approach in v4? so switching back to call_rcu as in v2 and if ctx->io_ev_fd is NULL then we call rcu_barrier to make sure all rcu callbacks are finished and check for NULL again.

Thanks!
Usama



[Index of Archives]     [Linux Samsung SoC]     [Linux Rockchip SoC]     [Linux Actions SoC]     [Linux for Synopsys ARC Processors]     [Linux NFS]     [Linux NILFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]


  Powered by Linux