On Thu 20-01-11 21:16:02, Jan Kara wrote: > On Fri 21-01-11 05:31:53, Nick Piggin wrote: > > On Thu, Jan 20, 2011 at 3:03 PM, Paul E. McKenney > > <paulmck@xxxxxxxxxxxxxxxxxx> wrote: > > > On Thu, Jan 20, 2011 at 08:20:00AM +1100, Nick Piggin wrote: > > >> On Thu, Jan 20, 2011 at 8:03 AM, Jeff Moyer <jmoyer@xxxxxxxxxx> wrote: > > >> >> I don't know exactly how all programs use io_destroy -- of the small > > >> >> number that do, probably an even smaller number would care here. But I > > >> >> don't think it simplifies things enough to use synchronize_rcu for it. > > >> > > > >> > Above it sounded like you didn't think AIO should be using RCU at all. > > >> > > >> synchronize_rcu of course, not RCU (typo). > > > > > > I think that Nick is suggesting that call_rcu() be used instead. > > > Perhaps also very sparing use of synchronize_rcu_expedited(), which > > > is faster than synchronize_rcu(), but which which uses more CPU time. > > > > call_rcu() is the obvious alternative, yes. > > > > Basically, once we give in to synchronize_rcu() we're basically giving > > up. That's certainly a very good tradeoff for something like filesystem > > unregistration or module unload, it buys big simplifications in real > > fastpaths. But I just don't think it should be taken lightly. > So in the end, I've realized I don't need synchronize_rcu() at all and > in fact everything is OK even without call_rcu() if I base my fix on top > of your patch. > > Attached is your patch with added comment I proposed and also a patch > fixing the second race. Better? Nick, any opinion on this? Should I push the patches upstream? Honza > From 68857d7f2087edbbc5ee1d828f151ac46406f3be Mon Sep 17 00:00:00 2001 > From: Nick Piggin <npiggin@xxxxxxxxx> > Date: Thu, 20 Jan 2011 20:08:52 +0100 > Subject: [PATCH 1/2] fs: Fix aio rcu ioctx lookup > > aio-dio-invalidate-failure GPFs in aio_put_req from io_submit. > > lookup_ioctx doesn't implement the rcu lookup pattern properly. rcu_read_lock > does not prevent refcount going to zero, so we might take a refcount on a zero > count ioctx. > > Fix the bug by atomically testing for zero refcount before incrementing. > > [JK: Added comment into the code] > > Signed-off-by: Nick Piggin <npiggin@xxxxxxxxx> > Signed-off-by: Jan Kara <jack@xxxxxxx> > --- > fs/aio.c | 35 ++++++++++++++++++++++++----------- > 1 files changed, 24 insertions(+), 11 deletions(-) > > diff --git a/fs/aio.c b/fs/aio.c > index fc557a3..b4dd668 100644 > --- a/fs/aio.c > +++ b/fs/aio.c > @@ -239,15 +239,23 @@ static void __put_ioctx(struct kioctx *ctx) > call_rcu(&ctx->rcu_head, ctx_rcu_free); > } > > -#define get_ioctx(kioctx) do { \ > - BUG_ON(atomic_read(&(kioctx)->users) <= 0); \ > - atomic_inc(&(kioctx)->users); \ > -} while (0) > -#define put_ioctx(kioctx) do { \ > - BUG_ON(atomic_read(&(kioctx)->users) <= 0); \ > - if (unlikely(atomic_dec_and_test(&(kioctx)->users))) \ > - __put_ioctx(kioctx); \ > -} while (0) > +static inline void get_ioctx(struct kioctx *kioctx) > +{ > + BUG_ON(atomic_read(&kioctx->users) <= 0); > + atomic_inc(&kioctx->users); > +} > + > +static inline int try_get_ioctx(struct kioctx *kioctx) > +{ > + return atomic_inc_not_zero(&kioctx->users); > +} > + > +static inline void put_ioctx(struct kioctx *kioctx) > +{ > + BUG_ON(atomic_read(&kioctx->users) <= 0); > + if (unlikely(atomic_dec_and_test(&kioctx->users))) > + __put_ioctx(kioctx); > +} > > /* ioctx_alloc > * Allocates and initializes an ioctx. Returns an ERR_PTR if it failed. > @@ -601,8 +609,13 @@ static struct kioctx *lookup_ioctx(unsigned long ctx_id) > rcu_read_lock(); > > hlist_for_each_entry_rcu(ctx, n, &mm->ioctx_list, list) { > - if (ctx->user_id == ctx_id && !ctx->dead) { > - get_ioctx(ctx); > + /* > + * RCU protects us against accessing freed memory but > + * we have to be careful not to get a reference when the > + * reference count already dropped to 0 (ctx->dead test > + * is unreliable because of races). > + */ > + if (ctx->user_id == ctx_id && !ctx->dead && try_get_ioctx(ctx)){ > ret = ctx; > break; > } > -- > 1.7.1 > > From 6d5375d55b5d88e8ceda739052566e033be620c2 Mon Sep 17 00:00:00 2001 > From: Jan Kara <jack@xxxxxxx> > Date: Wed, 19 Jan 2011 00:37:48 +0100 > Subject: [PATCH 2/2] fs: Fix race between io_destroy() and io_submit() in AIO > > A race can occur when io_submit() races with io_destroy(): > > CPU1 CPU2 > io_submit() > do_io_submit() > ... > ctx = lookup_ioctx(ctx_id); > io_destroy() > Now do_io_submit() holds the last reference to ctx. > ... > queue new AIO > put_ioctx(ctx) - frees ctx with active AIOs > > We solve this issue by checking whether ctx is being destroyed > in AIO submission path after adding new AIO to ctx. Then we > are guaranteed that either io_destroy() waits for new AIO or > we see that ctx is being destroyed and bail out. > > Signed-off-by: Jan Kara <jack@xxxxxxx> > --- > fs/aio.c | 15 +++++++++++++++ > 1 files changed, 15 insertions(+), 0 deletions(-) > > diff --git a/fs/aio.c b/fs/aio.c > index b4dd668..0244c04 100644 > --- a/fs/aio.c > +++ b/fs/aio.c > @@ -1642,6 +1642,21 @@ static int io_submit_one(struct kioctx *ctx, struct iocb __user *user_iocb, > goto out_put_req; > > spin_lock_irq(&ctx->ctx_lock); > + /* > + * We could have raced with io_destroy() and are currently holding a > + * reference to ctx which should be destroyed. We cannot submit IO > + * since ctx gets freed as soon as io_submit() puts its reference. > + * The check here is reliable since io_destroy() sets ctx->dead before > + * waiting for outstanding IO. Thus if we don't see ctx->dead set here, > + * io_destroy() waits for our IO to finish. > + * The check is inside ctx->ctx_lock to avoid extra memory barrier > + * in this fast path... > + */ > + if (ctx->dead) { > + spin_unlock_irq(&ctx->ctx_lock); > + ret = -EINVAL; > + goto out_put_req; > + } > aio_run_iocb(req); > if (!list_empty(&ctx->run_list)) { > /* drain the run list */ > -- > 1.7.1 > -- Jan Kara <jack@xxxxxxx> SUSE Labs, CR -- To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html