On 03/14/2013 10:50 AM, Kirill A. Shutemov wrote: > From: "Kirill A. Shutemov" <kirill.shutemov@xxxxxxxxxxxxxxx> > > For huge page we add to radix tree HPAGE_CACHE_NR pages at once: head > page for the specified index and HPAGE_CACHE_NR-1 tail pages for > following indexes. > > Signed-off-by: Kirill A. Shutemov <kirill.shutemov@xxxxxxxxxxxxxxx> > --- > mm/filemap.c | 76 ++++++++++++++++++++++++++++++++++++++++------------------ > 1 file changed, 53 insertions(+), 23 deletions(-) > > diff --git a/mm/filemap.c b/mm/filemap.c > index 2d99191..6bac9e2 100644 > --- a/mm/filemap.c > +++ b/mm/filemap.c > @@ -447,6 +447,7 @@ int add_to_page_cache_locked(struct page *page, struct address_space *mapping, > pgoff_t offset, gfp_t gfp_mask) > { > int error; > + int nr = 1; > > VM_BUG_ON(!PageLocked(page)); > VM_BUG_ON(PageSwapBacked(page)); > @@ -454,32 +455,61 @@ int add_to_page_cache_locked(struct page *page, struct address_space *mapping, > error = mem_cgroup_cache_charge(page, current->mm, > gfp_mask & GFP_RECLAIM_MASK); > if (error) > - goto out; > + return error; > > - error = radix_tree_preload(gfp_mask & ~__GFP_HIGHMEM); > - if (error == 0) { > - page_cache_get(page); > - page->mapping = mapping; > - page->index = offset; > + if (PageTransHuge(page)) { > + BUILD_BUG_ON(HPAGE_CACHE_NR > RADIX_TREE_PRELOAD_NR); > + nr = HPAGE_CACHE_NR; > + } That seems like a slightly odd place to put a BUILD_BUG_ON(). I guess it doesn't matter to some degree, but does putting it inside the if() imply anything? > + error = radix_tree_preload_count(nr, gfp_mask & ~__GFP_HIGHMEM); > + if (error) { > + mem_cgroup_uncharge_cache_page(page); > + return error; > + } > > - spin_lock_irq(&mapping->tree_lock); > - error = radix_tree_insert(&mapping->page_tree, offset, page); > - if (likely(!error)) { > - mapping->nrpages++; > - __inc_zone_page_state(page, NR_FILE_PAGES); > - spin_unlock_irq(&mapping->tree_lock); > - trace_mm_filemap_add_to_page_cache(page); > - } else { > - page->mapping = NULL; > - /* Leave page->index set: truncation relies upon it */ > - spin_unlock_irq(&mapping->tree_lock); > - mem_cgroup_uncharge_cache_page(page); > - page_cache_release(page); I do really like how this rewrite de-indents this code. :) > + page_cache_get(page); > + spin_lock_irq(&mapping->tree_lock); > + page->mapping = mapping; > + page->index = offset; > + error = radix_tree_insert(&mapping->page_tree, offset, page); > + if (unlikely(error)) > + goto err; > + if (PageTransHuge(page)) { > + int i; > + for (i = 1; i < HPAGE_CACHE_NR; i++) { > + page_cache_get(page + i); > + page[i].index = offset + i; Is it OK to leave page->mapping unset for these? > + error = radix_tree_insert(&mapping->page_tree, > + offset + i, page + i); > + if (error) { > + page_cache_release(page + i); > + break; > + } > } Throughout all this new code, I'd really challenge you to try as much as possible to minimize the code stuck under "if (PageTransHuge(page))". For instance, could you change the for() loop a bit and have it shared between both cases, like: > + for (i = 0; i < nr; i++) { > + page_cache_get(page + i); > + page[i].index = offset + i; > + error = radix_tree_insert(&mapping->page_tree, > + offset + i, page + i); > + if (error) { > + page_cache_release(page + i); > + break; > + } > } > - radix_tree_preload_end(); > - } else > - mem_cgroup_uncharge_cache_page(page); > -out: > + if (error) { > + error = ENOSPC; /* no space for a huge page */ > + for (i--; i > 0; i--) { > + radix_tree_delete(&mapping->page_tree, > + offset + i); > + page_cache_release(page + i); > + } > + radix_tree_delete(&mapping->page_tree, offset); I wonder if this would look any nicer if you just did all the page_cache_get()s for the entire huge page along with the head page, and then released them all in one place. I think it might shrink the error handling paths here. > + goto err; > + } > + } > + __mod_zone_page_state(page_zone(page), NR_FILE_PAGES, nr); > + mapping->nrpages += nr; > + spin_unlock_irq(&mapping->tree_lock); > + trace_mm_filemap_add_to_page_cache(page); Do we need to change the tracing to make sure it notes that these were or weren't huge pages? > + radix_tree_preload_end(); > + return 0; > +err: > + page->mapping = NULL; > + /* Leave page->index set: truncation relies upon it */ > + spin_unlock_irq(&mapping->tree_lock); > + radix_tree_preload_end(); > + mem_cgroup_uncharge_cache_page(page); > + page_cache_release(page); > return error; > } > EXPORT_SYMBOL(add_to_page_cache_locked); Does the cgroup code know how to handle these large pages internally somehow? It looks like the charge/uncharge is only being done for the head page. -- To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html