Re: [PATCH RESEND] fs: aio: fix the increment of aio-nr and counting against aio-max-nr

Benjamin LaHaise <bcrl@xxxxxxxxx> · Thu, 6 Jul 2017 17:07:18 -0400

On Wed, Jul 05, 2017 at 03:28:14PM -0400, Jeff Moyer wrote:
> Mauricio Faria de Oliveira <mauricfo@xxxxxxxxxxxxxxxxxx> writes:
> 
> > Currently, aio-nr is incremented in steps of 'num_possible_cpus() * 8'
> > for io_setup(nr_events, ..) with 'nr_events < num_possible_cpus() * 4':
> >
> >     ioctx_alloc()
> >     ...
> >         nr_events = max(nr_events, num_possible_cpus() * 4);
> >         nr_events *= 2;
> >     ...
> >         ctx->max_reqs = nr_events;
> >     ...
> >         aio_nr += ctx->max_reqs;
> >     ....
> >
> > This limits the number of aio contexts actually available to much less
> > than aio-max-nr, and is increasingly worse with greater number of CPUs.
> >
> > For example, with 64 CPUs, only 256 aio contexts are actually available
> > (with aio-max-nr = 65536) because the increment is 512 in that scenario.
> >
> > Note: 65536 [max aio contexts] / (64*4*2) [increment per aio context]
> > is 128, but make it 256 (double) as counting against 'aio-max-nr * 2':
> >
> >     ioctx_alloc()
> >     ...
> >         if (aio_nr + nr_events > (aio_max_nr * 2UL) ||
> >         ...
> >             goto err_ctx;
> >     ...
> >
> > This patch uses the original value of nr_events (from userspace) to
> > increment aio-nr and count against aio-max-nr, which resolves those.
> >
> > Signed-off-by: Mauricio Faria de Oliveira <mauricfo@xxxxxxxxxxxxxxxxxx>
> > Reported-by: Lekshmi C. Pillai <lekshmi.cpillai@xxxxxxxxxx>
> > Tested-by: Lekshmi C. Pillai <lekshmi.cpillai@xxxxxxxxxx>
> > Tested-by: Paul Nguyen <nguyenp@xxxxxxxxxx>
> 
> Thanks for your persistence in re-posting this.  The fix looks good to
> me.  Ben, can you queue this up?

I'm queuing this up in my aio-next and will push upstream after a few days
of soaking in linux-next.

		-ben

> Reviewed-by: Jeff Moyer <jmoyer@xxxxxxxxxx>
> 
> > ---
> >  fs/aio.c | 19 ++++++++++++-------
> >  1 file changed, 12 insertions(+), 7 deletions(-)
> >
> > diff --git a/fs/aio.c b/fs/aio.c
> > index f52d925ee259..3908480d7ccd 100644
> > --- a/fs/aio.c
> > +++ b/fs/aio.c
> > @@ -441,10 +441,9 @@ static int aio_migratepage(struct address_space *mapping, struct page *new,
> >  #endif
> >  };
> >  
> > -static int aio_setup_ring(struct kioctx *ctx)
> > +static int aio_setup_ring(struct kioctx *ctx, unsigned int nr_events)
> >  {
> >  	struct aio_ring *ring;
> > -	unsigned nr_events = ctx->max_reqs;
> >  	struct mm_struct *mm = current->mm;
> >  	unsigned long size, unused;
> >  	int nr_pages;
> > @@ -707,6 +706,12 @@ static struct kioctx *ioctx_alloc(unsigned nr_events)
> >  	int err = -ENOMEM;
> >  
> >  	/*
> > +	 * Store the original nr_events -- what userspace passed to io_setup(),
> > +	 * for counting against the global limit -- before it changes.
> > +	 */
> > +	unsigned int max_reqs = nr_events;
> > +
> > +	/*
> >  	 * We keep track of the number of available ringbuffer slots, to prevent
> >  	 * overflow (reqs_available), and we also use percpu counters for this.
> >  	 *
> > @@ -724,14 +729,14 @@ static struct kioctx *ioctx_alloc(unsigned nr_events)
> >  		return ERR_PTR(-EINVAL);
> >  	}
> >  
> > -	if (!nr_events || (unsigned long)nr_events > (aio_max_nr * 2UL))
> > +	if (!nr_events || (unsigned long)max_reqs > aio_max_nr)
> >  		return ERR_PTR(-EAGAIN);
> >  
> >  	ctx = kmem_cache_zalloc(kioctx_cachep, GFP_KERNEL);
> >  	if (!ctx)
> >  		return ERR_PTR(-ENOMEM);
> >  
> > -	ctx->max_reqs = nr_events;
> > +	ctx->max_reqs = max_reqs;
> >  
> >  	spin_lock_init(&ctx->ctx_lock);
> >  	spin_lock_init(&ctx->completion_lock);
> > @@ -753,7 +758,7 @@ static struct kioctx *ioctx_alloc(unsigned nr_events)
> >  	if (!ctx->cpu)
> >  		goto err;
> >  
> > -	err = aio_setup_ring(ctx);
> > +	err = aio_setup_ring(ctx, nr_events);
> >  	if (err < 0)
> >  		goto err;
> >  
> > @@ -764,8 +769,8 @@ static struct kioctx *ioctx_alloc(unsigned nr_events)
> >  
> >  	/* limit the number of system wide aios */
> >  	spin_lock(&aio_nr_lock);
> > -	if (aio_nr + nr_events > (aio_max_nr * 2UL) ||
> > -	    aio_nr + nr_events < aio_nr) {
> > +	if (aio_nr + ctx->max_reqs > aio_max_nr ||
> > +	    aio_nr + ctx->max_reqs < aio_nr) {
> >  		spin_unlock(&aio_nr_lock);
> >  		err = -EAGAIN;
> >  		goto err_ctx;
> 

-- 
"Thought is the essence of where you are now."