On 05/11/2013 06:23 PM, Kirill A. Shutemov wrote: > From: "Kirill A. Shutemov" <kirill.shutemov@xxxxxxxxxxxxxxx> > > For now we try to grab a huge cache page if gfp_mask has __GFP_COMP. > It's probably to weak condition and need to be reworked later. > > Signed-off-by: Kirill A. Shutemov <kirill.shutemov@xxxxxxxxxxxxxxx> > --- > fs/libfs.c | 50 ++++++++++++++++++++++++++++++++++++----------- > include/linux/pagemap.h | 8 ++++++++ > 2 files changed, 47 insertions(+), 11 deletions(-) > > diff --git a/fs/libfs.c b/fs/libfs.c > index 916da8c..ce807fe 100644 > --- a/fs/libfs.c > +++ b/fs/libfs.c > @@ -383,7 +383,7 @@ EXPORT_SYMBOL(simple_setattr); > > int simple_readpage(struct file *file, struct page *page) > { > - clear_highpage(page); > + clear_pagecache_page(page); > flush_dcache_page(page); > SetPageUptodate(page); > unlock_page(page); > @@ -394,21 +394,44 @@ int simple_write_begin(struct file *file, struct address_space *mapping, > loff_t pos, unsigned len, unsigned flags, > struct page **pagep, void **fsdata) > { > - struct page *page; > + struct page *page = NULL; > pgoff_t index; I know ramfs uses simple_write_begin(), but it's not the only one. I think you probably want to create a new ->write_begin() function just for ramfs rather than modifying this one. The optimization that you just put in a few patches ago: >> +static inline struct page *grab_cache_page_write_begin( >> +{ >> + if (!transparent_hugepage_pagecache() && (flags & AOP_FLAG_TRANSHUGE)) >> + return NULL; >> + return __grab_cache_page_write_begin(mapping, index, flags); is now worthless for any user of simple_readpage(). > index = pos >> PAGE_CACHE_SHIFT; > > - page = grab_cache_page_write_begin(mapping, index, flags); > + /* XXX: too weak condition? */ Why would it be too weak? > + if (mapping_can_have_hugepages(mapping)) { > + page = grab_cache_page_write_begin(mapping, > + index & ~HPAGE_CACHE_INDEX_MASK, > + flags | AOP_FLAG_TRANSHUGE); > + /* fallback to small page */ > + if (!page) { > + unsigned long offset; > + offset = pos & ~PAGE_CACHE_MASK; > + len = min_t(unsigned long, > + len, PAGE_CACHE_SIZE - offset); > + } Why does this have to muck with 'len'? It doesn't appear to be undoing anything from earlier in the function. What is it fixing up? > + BUG_ON(page && !PageTransHuge(page)); > + } So, those semantics for AOP_FLAG_TRANSHUGE are actually pretty strong. They mean that you can only return a transparent pagecache page, but you better not return a small page. Would it have been possible for a huge page to get returned from grab_cache_page_write_begin(), but had it split up between there and the BUG_ON()? Which reminds me... under what circumstances _do_ we split these huge pages? How are those circumstances different from the anonymous ones? > + if (!page) > + page = grab_cache_page_write_begin(mapping, index, flags); > if (!page) > return -ENOMEM; > - > *pagep = page; > > - if (!PageUptodate(page) && (len != PAGE_CACHE_SIZE)) { > - unsigned from = pos & (PAGE_CACHE_SIZE - 1); > - > - zero_user_segments(page, 0, from, from + len, PAGE_CACHE_SIZE); > + if (!PageUptodate(page)) { > + unsigned from; > + > + if (PageTransHuge(page) && len != HPAGE_PMD_SIZE) { > + from = pos & ~HPAGE_PMD_MASK; > + zero_huge_user_segment(page, 0, from); > + zero_huge_user_segment(page, > + from + len, HPAGE_PMD_SIZE); > + } else if (len != PAGE_CACHE_SIZE) { > + from = pos & ~PAGE_CACHE_MASK; > + zero_user_segments(page, 0, from, > + from + len, PAGE_CACHE_SIZE); > + } > } > return 0; > } > @@ -443,9 +466,14 @@ int simple_write_end(struct file *file, struct address_space *mapping, > > /* zero the stale part of the page if we did a short copy */ > if (copied < len) { > - unsigned from = pos & (PAGE_CACHE_SIZE - 1); > - > - zero_user(page, from + copied, len - copied); > + unsigned from; > + if (PageTransHuge(page)) { > + from = pos & ~HPAGE_PMD_MASK; > + zero_huge_user(page, from + copied, len - copied); > + } else { > + from = pos & ~PAGE_CACHE_MASK; > + zero_user(page, from + copied, len - copied); > + } > } When I see stuff going in to the simple_* functions, I fear that this code will end up getting copied in to each and every one of the filesystems that implement these on their own. I guess this works for now, but I'm worried that the the next fs is just going to copy-and-paste these. Guess I'll yell at them when they do it. :) -- To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html