Decouple ctx->max_reqs and ctx->nr_events; each one represents a different side of the same coin -- userspace and kernelspace, respectively. Briefly, ctx->max_reqs represents what is userspace/externally accessible by userspace; and ctx->nr_events represents what is kernelspace/internally needed by the percpu allocation scheme. With the percpu scheme, the original value of ctx->max_reqs from userspace is changed (but still used to count against aio_max_nr) based on num_possible_cpus(), and it may increase significantly on systems with great num_possible_cpus() for smaller nr_events. This eventually prevents userspace applications from getting the actual value of aio_max_nr in the total requested nr_events. ctx->max_reqs ============= The ctx->max_reqs value once again aligns with its description: * This is what userspace passed to io_setup(), it's not used for * anything but counting against the global max_reqs quota. It stores the original value of nr_events that userspace passed to io_setup() (it's not increased to make room for requirements of the percpu allocation scheme) - and is used to increment and decrement the 'aio_nr' value, and to check against 'aio_max_nr'. So, regardless of how many additional nr_events are internally required for the percpu allocation scheme (e.g. make it 4x the number of possible CPUs, and double it), userspace can get all of the 'aio-max-nr' value that is made available/visible to it. Another benefit is a consistent value in '/proc/sys/fs/aio-nr': the sum of all values as requested by userspace, and it's less than or equal to '/proc/sys/fs/aio-max-nr' again (not 2x it). ctx->nr_events ============== The ctx->nr_events value is the actual size of the ringbuffer/ number of slots, which may be more than what userspace passed to io_setup() (depending on the requested value for nr_events and/or calculations made in aio_setup_ring()) - as determined by the percpu allocation scheme for its correct/fast behavior. Signed-off-by: Mauricio Faria de Oliveira <mauricfo@xxxxxxxxxxxxxxxxxx> --- fs/aio.c | 14 ++++++++++---- 1 file changed, 10 insertions(+), 4 deletions(-) diff --git a/fs/aio.c b/fs/aio.c index 7c3c01f352c1..4967b0e1ef1a 100644 --- a/fs/aio.c +++ b/fs/aio.c @@ -706,6 +706,12 @@ static struct kioctx *ioctx_alloc(unsigned nr_events) int err = -ENOMEM; /* + * Store the original value of nr_events from userspace for counting + * against the global limit (aio_max_nr). + */ + unsigned max_reqs = nr_events; + + /* * We keep track of the number of available ringbuffer slots, to prevent * overflow (reqs_available), and we also use percpu counters for this. * @@ -723,14 +729,14 @@ static struct kioctx *ioctx_alloc(unsigned nr_events) return ERR_PTR(-EINVAL); } - if (!nr_events || (unsigned long)nr_events > (aio_max_nr * 2UL)) + if (!nr_events || (unsigned long)max_reqs > aio_max_nr) return ERR_PTR(-EAGAIN); ctx = kmem_cache_zalloc(kioctx_cachep, GFP_KERNEL); if (!ctx) return ERR_PTR(-ENOMEM); - ctx->max_reqs = nr_events; + ctx->max_reqs = max_reqs; spin_lock_init(&ctx->ctx_lock); spin_lock_init(&ctx->completion_lock); @@ -763,8 +769,8 @@ static struct kioctx *ioctx_alloc(unsigned nr_events) /* limit the number of system wide aios */ spin_lock(&aio_nr_lock); - if (aio_nr + nr_events > (aio_max_nr * 2UL) || - aio_nr + nr_events < aio_nr) { + if (aio_nr + ctx->max_reqs > aio_max_nr || + aio_nr + ctx->max_reqs < aio_nr) { spin_unlock(&aio_nr_lock); err = -EAGAIN; goto err_ctx; -- 1.8.3.1