This is a note to let you know that I've just added the patch titled aio: clean up and fix aio_setup_ring page mapping to the 3.12-stable tree which can be found at: http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=summary The filename of the patch is: aio-clean-up-and-fix-aio_setup_ring-page-mapping.patch and it can be found in the queue-3.12 subdirectory. If you, or anyone else, feels it should not be added to the stable tree, please let <stable@xxxxxxxxxxxxxxx> know about it. >From 3dc9acb67600393249a795934ccdfc291a200e6b Mon Sep 17 00:00:00 2001 From: Linus Torvalds <torvalds@xxxxxxxxxxxxxxxxxxxx> Date: Fri, 20 Dec 2013 05:11:12 +0900 Subject: aio: clean up and fix aio_setup_ring page mapping From: Linus Torvalds <torvalds@xxxxxxxxxxxxxxxxxxxx> commit 3dc9acb67600393249a795934ccdfc291a200e6b upstream. Since commit 36bc08cc01709 ("fs/aio: Add support to aio ring pages migration") the aio ring setup code has used a special per-ring backing inode for the page allocations, rather than just using random anonymous pages. However, rather than remembering the pages as it allocated them, it would allocate the pages, insert them into the file mapping (dirty, so that they couldn't be free'd), and then forget about them. And then to look them up again, it would mmap the mapping, and then use "get_user_pages()" to get back an array of the pages we just created. Now, not only is that incredibly inefficient, it also leaked all the pages if the mmap failed (which could happen due to excessive number of mappings, for example). So clean it all up, making it much more straightforward. Also remove some left-overs of the previous (broken) mm_populate() usage that was removed in commit d6c355c7dabc ("aio: fix race in ring buffer page lookup introduced by page migration support") but left the pointless and now misleading MAP_POPULATE flag around. Tested-and-acked-by: Benjamin LaHaise <bcrl@xxxxxxxxx> Signed-off-by: Linus Torvalds <torvalds@xxxxxxxxxxxxxxxxxxxx> Signed-off-by: Greg Kroah-Hartman <gregkh@xxxxxxxxxxxxxxxxxxx> --- fs/aio.c | 58 +++++++++++++++++++++++----------------------------------- 1 file changed, 23 insertions(+), 35 deletions(-) --- a/fs/aio.c +++ b/fs/aio.c @@ -326,7 +326,7 @@ static int aio_setup_ring(struct kioctx struct aio_ring *ring; unsigned nr_events = ctx->max_reqs; struct mm_struct *mm = current->mm; - unsigned long size, populate; + unsigned long size, unused; int nr_pages; int i; struct file *file; @@ -347,6 +347,20 @@ static int aio_setup_ring(struct kioctx return -EAGAIN; } + ctx->aio_ring_file = file; + nr_events = (PAGE_SIZE * nr_pages - sizeof(struct aio_ring)) + / sizeof(struct io_event); + + ctx->ring_pages = ctx->internal_pages; + if (nr_pages > AIO_RING_PAGES) { + ctx->ring_pages = kcalloc(nr_pages, sizeof(struct page *), + GFP_KERNEL); + if (!ctx->ring_pages) { + put_aio_ring_file(ctx); + return -ENOMEM; + } + } + for (i = 0; i < nr_pages; i++) { struct page *page; page = find_or_create_page(file->f_inode->i_mapping, @@ -358,19 +372,14 @@ static int aio_setup_ring(struct kioctx SetPageUptodate(page); SetPageDirty(page); unlock_page(page); + + ctx->ring_pages[i] = page; } - ctx->aio_ring_file = file; - nr_events = (PAGE_SIZE * nr_pages - sizeof(struct aio_ring)) - / sizeof(struct io_event); + ctx->nr_pages = i; - ctx->ring_pages = ctx->internal_pages; - if (nr_pages > AIO_RING_PAGES) { - ctx->ring_pages = kcalloc(nr_pages, sizeof(struct page *), - GFP_KERNEL); - if (!ctx->ring_pages) { - put_aio_ring_file(ctx); - return -ENOMEM; - } + if (unlikely(i != nr_pages)) { + aio_free_ring(ctx); + return -EAGAIN; } ctx->mmap_size = nr_pages * PAGE_SIZE; @@ -379,9 +388,9 @@ static int aio_setup_ring(struct kioctx down_write(&mm->mmap_sem); ctx->mmap_base = do_mmap_pgoff(ctx->aio_ring_file, 0, ctx->mmap_size, PROT_READ | PROT_WRITE, - MAP_SHARED | MAP_POPULATE, 0, &populate); + MAP_SHARED, 0, &unused); + up_write(&mm->mmap_sem); if (IS_ERR((void *)ctx->mmap_base)) { - up_write(&mm->mmap_sem); ctx->mmap_size = 0; aio_free_ring(ctx); return -EAGAIN; @@ -389,27 +398,6 @@ static int aio_setup_ring(struct kioctx pr_debug("mmap address: 0x%08lx\n", ctx->mmap_base); - /* We must do this while still holding mmap_sem for write, as we - * need to be protected against userspace attempting to mremap() - * or munmap() the ring buffer. - */ - ctx->nr_pages = get_user_pages(current, mm, ctx->mmap_base, nr_pages, - 1, 0, ctx->ring_pages, NULL); - - /* Dropping the reference here is safe as the page cache will hold - * onto the pages for us. It is also required so that page migration - * can unmap the pages and get the right reference count. - */ - for (i = 0; i < ctx->nr_pages; i++) - put_page(ctx->ring_pages[i]); - - up_write(&mm->mmap_sem); - - if (unlikely(ctx->nr_pages != nr_pages)) { - aio_free_ring(ctx); - return -EAGAIN; - } - ctx->user_id = ctx->mmap_base; ctx->nr_events = nr_events; /* trusted copy */ Patches currently in stable-queue which might be from torvalds@xxxxxxxxxxxxxxxxxxxx are queue-3.12/sched-numa-skip-inaccessible-vmas.patch queue-3.12/memcg-fix-memcg_size-calculation.patch queue-3.12/mm-fix-use-after-free-in-sys_remap_file_pages.patch queue-3.12/auxvec.h-account-for-at_hwcap2-in-at_vector_size_base.patch queue-3.12/mm-compaction-respect-ignore_skip_hint-in-update_pageblock_skip.patch queue-3.12/aio-clean-up-and-fix-aio_setup_ring-page-mapping.patch queue-3.12/sh-always-link-in-helper-functions-extracted-from-libgcc.patch queue-3.12/mm-clear-pmd_numa-before-invalidating.patch queue-3.12/mm-fix-tlb-flush-race-between-migration-and-change_protection_range.patch queue-3.12/mm-numa-ensure-anon_vma-is-locked-to-prevent-parallel-thp-splits.patch queue-3.12/mm-hugetlb-check-for-pte-null-pointer-in-__page_check_address.patch queue-3.12/mm-munlock-fix-deadlock-in-__munlock_pagevec.patch queue-3.12/mm-numa-guarantee-that-tlb_flush_pending-updates-are-visible-before-page-table-updates.patch queue-3.12/mm-numa-avoid-unnecessary-work-on-the-failure-path.patch queue-3.12/mm-page_alloc-revert-numa-aspect-of-fair-allocation-policy.patch queue-3.12/cpupower-fix-segfault-due-to-incorrect-getopt_long-arugments.patch queue-3.12/mm-memory-failure.c-transfer-page-count-from-head-page-to-tail-page-after-split-thp.patch queue-3.12/mm-mempolicy-correct-putback-method-for-isolate-pages-if-failed.patch queue-3.12/mm-memory-failure.c-recheck-pagehuge-after-hugetlb-page-migrate-successfully.patch queue-3.12/kexec-migrate-to-reboot-cpu.patch queue-3.12/mm-munlock-fix-a-bug-where-thp-tail-page-is-encountered.patch -- To unsubscribe from this list: send the line "unsubscribe stable" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html