Re: [patch] fs: aio fix rcu lookup

Jan Kara <jack@xxxxxxx> · Thu, 20 Jan 2011 21:16:02 +0100

On Fri 21-01-11 05:31:53, Nick Piggin wrote:
> On Thu, Jan 20, 2011 at 3:03 PM, Paul E. McKenney
> <paulmck@xxxxxxxxxxxxxxxxxx> wrote:
> > On Thu, Jan 20, 2011 at 08:20:00AM +1100, Nick Piggin wrote:
> >> On Thu, Jan 20, 2011 at 8:03 AM, Jeff Moyer <jmoyer@xxxxxxxxxx> wrote:
> >> >> I don't know exactly how all programs use io_destroy -- of the small
> >> >> number that do, probably an even smaller number would care here. But I
> >> >> don't think it simplifies things enough to use synchronize_rcu for it.
> >> >
> >> > Above it sounded like you didn't think AIO should be using RCU at all.
> >>
> >> synchronize_rcu of course, not RCU (typo).
> >
> > I think that Nick is suggesting that call_rcu() be used instead.
> > Perhaps also very sparing use of synchronize_rcu_expedited(), which
> > is faster than synchronize_rcu(), but which which uses more CPU time.
> 
> call_rcu() is the obvious alternative, yes.
> 
> Basically, once we give in to synchronize_rcu() we're basically giving
> up. That's certainly a very good tradeoff for something like filesystem
> unregistration or module unload, it buys big simplifications in real
> fastpaths. But I just don't think it should be taken lightly.
So in the end, I've realized I don't need synchronize_rcu() at all and
in fact everything is OK even without call_rcu() if I base my fix on top
of your patch.

Attached is your patch with added comment I proposed and also a patch
fixing the second race. Better?

								Honza
-- 
Jan Kara <jack@xxxxxxx>
SUSE Labs, CR
>From 68857d7f2087edbbc5ee1d828f151ac46406f3be Mon Sep 17 00:00:00 2001
From: Nick Piggin <npiggin@xxxxxxxxx>
Date: Thu, 20 Jan 2011 20:08:52 +0100
Subject: [PATCH 1/2] fs: Fix aio rcu ioctx lookup

aio-dio-invalidate-failure GPFs in aio_put_req from io_submit.

lookup_ioctx doesn't implement the rcu lookup pattern properly.  rcu_read_lock
does not prevent refcount going to zero, so we might take a refcount on a zero
count ioctx.

Fix the bug by atomically testing for zero refcount before incrementing.

[JK: Added comment into the code]

Signed-off-by: Nick Piggin <npiggin@xxxxxxxxx>
Signed-off-by: Jan Kara <jack@xxxxxxx>
---
 fs/aio.c |   35 ++++++++++++++++++++++++-----------
 1 files changed, 24 insertions(+), 11 deletions(-)

diff --git a/fs/aio.c b/fs/aio.c
index fc557a3..b4dd668 100644
--- a/fs/aio.c
+++ b/fs/aio.c
@@ -239,15 +239,23 @@ static void __put_ioctx(struct kioctx *ctx)
 	call_rcu(&ctx->rcu_head, ctx_rcu_free);
 }
 
-#define get_ioctx(kioctx) do {						\
-	BUG_ON(atomic_read(&(kioctx)->users) <= 0);			\
-	atomic_inc(&(kioctx)->users);					\
-} while (0)
-#define put_ioctx(kioctx) do {						\
-	BUG_ON(atomic_read(&(kioctx)->users) <= 0);			\
-	if (unlikely(atomic_dec_and_test(&(kioctx)->users))) 		\
-		__put_ioctx(kioctx);					\
-} while (0)
+static inline void get_ioctx(struct kioctx *kioctx)
+{
+	BUG_ON(atomic_read(&kioctx->users) <= 0);
+	atomic_inc(&kioctx->users);
+}
+
+static inline int try_get_ioctx(struct kioctx *kioctx)
+{
+	return atomic_inc_not_zero(&kioctx->users);
+}
+
+static inline void put_ioctx(struct kioctx *kioctx)
+{
+	BUG_ON(atomic_read(&kioctx->users) <= 0);
+	if (unlikely(atomic_dec_and_test(&kioctx->users)))
+		__put_ioctx(kioctx);
+}
 
 /* ioctx_alloc
  *	Allocates and initializes an ioctx.  Returns an ERR_PTR if it failed.
@@ -601,8 +609,13 @@ static struct kioctx *lookup_ioctx(unsigned long ctx_id)
 	rcu_read_lock();
 
 	hlist_for_each_entry_rcu(ctx, n, &mm->ioctx_list, list) {
-		if (ctx->user_id == ctx_id && !ctx->dead) {
-			get_ioctx(ctx);
+		/*
+		 * RCU protects us against accessing freed memory but
+		 * we have to be careful not to get a reference when the
+		 * reference count already dropped to 0 (ctx->dead test
+		 * is unreliable because of races).
+		 */
+		if (ctx->user_id == ctx_id && !ctx->dead && try_get_ioctx(ctx)){
 			ret = ctx;
 			break;
 		}
-- 
1.7.1

>From 6d5375d55b5d88e8ceda739052566e033be620c2 Mon Sep 17 00:00:00 2001
From: Jan Kara <jack@xxxxxxx>
Date: Wed, 19 Jan 2011 00:37:48 +0100
Subject: [PATCH 2/2] fs: Fix race between io_destroy() and io_submit() in AIO

A race can occur when io_submit() races with io_destroy():

 CPU1						CPU2
io_submit()
  do_io_submit()
    ...
    ctx = lookup_ioctx(ctx_id);
						io_destroy()
    Now do_io_submit() holds the last reference to ctx.
    ...
    queue new AIO
    put_ioctx(ctx) - frees ctx with active AIOs

We solve this issue by checking whether ctx is being destroyed
in AIO submission path after adding new AIO to ctx. Then we
are guaranteed that either io_destroy() waits for new AIO or
we see that ctx is being destroyed and bail out.

Signed-off-by: Jan Kara <jack@xxxxxxx>
---
 fs/aio.c |   15 +++++++++++++++
 1 files changed, 15 insertions(+), 0 deletions(-)

diff --git a/fs/aio.c b/fs/aio.c
index b4dd668..0244c04 100644
--- a/fs/aio.c
+++ b/fs/aio.c
@@ -1642,6 +1642,21 @@ static int io_submit_one(struct kioctx *ctx, struct iocb __user *user_iocb,
 		goto out_put_req;
 
 	spin_lock_irq(&ctx->ctx_lock);
+	/*
+	 * We could have raced with io_destroy() and are currently holding a
+	 * reference to ctx which should be destroyed. We cannot submit IO
+	 * since ctx gets freed as soon as io_submit() puts its reference.
+	 * The check here is reliable since io_destroy() sets ctx->dead before
+	 * waiting for outstanding IO. Thus if we don't see ctx->dead set here,
+	 * io_destroy() waits for our IO to finish.
+	 * The check is inside ctx->ctx_lock to avoid extra memory barrier
+	 * in this fast path...
+	 */
+	if (ctx->dead) {
+		spin_unlock_irq(&ctx->ctx_lock);
+		ret = -EINVAL;
+		goto out_put_req;
+	}
 	aio_run_iocb(req);
 	if (!list_empty(&ctx->run_list)) {
 		/* drain the run list */
-- 
1.7.1