On Thu, 10 Nov 2016, Kirill A. Shutemov wrote: > On Mon, Nov 07, 2016 at 03:17:11PM -0800, Hugh Dickins wrote: > > > Treating the first extent differently is a hack, and does not respect > > that this is a filesystem, on which size is likely to increase. > > > > By all means refine the condition for huge=within_size, and by all means > > warn in transhuge.txt that huge=always may tend to waste valuable huge > > pages if the filesystem is used for small files without good reason > > Would it be okay, if I just replace huge=within_size logic with what I > proposed here for huge=always? In principle yes, that would be fine with me: I just don't care very much about this option, since we do not force "huge=always" on anyone, so everyone is free to use it where it's useful, and not where it's not. But perhaps your aim is to have "huge=within_size" set by default on /tmp, and so not behave badly there: I'd never aimed for that, and I'm a bit sceptical about it, but if you can get good enough behaviour out of it for that, I won't stand in your way. > > That's not what I intended initially for this option, but... > > > (but maybe the implementation needs to reclaim those more effectively). > > It's more about cost of allocation than memory pressure. Regarding that issue, I think you should reconsider the GFP flags used in shmem_alloc_hugepage(). GFP flags, and compaction latency avoidance, have been moving targets over the last year, and I've not rechecked; but I got the impression that your GFP flags are still asking for the compaction stalls that are now deprecated on the anon THP fault path? I repeat, I've not rechecked that before writing, maybe it's a libel! > > -----8<----- > > From 287ab05c09bfd49c7356ca74b6fea36d8131edaf Mon Sep 17 00:00:00 2001 > From: "Kirill A. Shutemov" <kirill.shutemov@xxxxxxxxxxxxxxx> > Date: Mon, 17 Oct 2016 14:44:47 +0300 > Subject: [PATCH] shmem: avoid huge pages for small files > > Huge pages are detrimental for small file: they causes noticible > overhead on both allocation performance and memory footprint. > > This patch aimed to address this issue by avoiding huge pages until > file grown to size of huge page if the filesystem mounted with > huge=within_size option. > > This would cover most of the cases where huge pages causes regressions > in performance. It's not a regression if "huge=always" is worse than "huge=never" in some cases: just cases where it's better not to mount "huge=always". > > The limit doesn't affect khugepaged behaviour: it still can collapse > pages based on its settings. > > Signed-off-by: Kirill A. Shutemov <kirill.shutemov@xxxxxxxxxxxxxxx> > --- > Documentation/vm/transhuge.txt | 7 ++++++- > mm/shmem.c | 6 ++---- > 2 files changed, 8 insertions(+), 5 deletions(-) > > diff --git a/Documentation/vm/transhuge.txt b/Documentation/vm/transhuge.txt > index 2ec6adb5a4ce..14c911c56f4a 100644 > --- a/Documentation/vm/transhuge.txt > +++ b/Documentation/vm/transhuge.txt > @@ -208,11 +208,16 @@ You can control hugepage allocation policy in tmpfs with mount option > - "always": > Attempt to allocate huge pages every time we need a new page; > Nit: please change the semi-colon to full-stop, and delete the blank line. > + This option can lead to significant overhead if filesystem is used to > + store small files. > + > - "never": > Do not allocate huge pages; > > - "within_size": > - Only allocate huge page if it will be fully within i_size. > + Only allocate huge page if size of the file more than size of huge > + page. This helps to avoid overhead for small files. > + > Also respect fadvise()/madvise() hints; > > - "advise: > diff --git a/mm/shmem.c b/mm/shmem.c > index ad7813d73ea7..3589d36c7c63 100644 > --- a/mm/shmem.c > +++ b/mm/shmem.c > @@ -1681,10 +1681,8 @@ static int shmem_getpage_gfp(struct inode *inode, pgoff_t index, > case SHMEM_HUGE_NEVER: > goto alloc_nohuge; > case SHMEM_HUGE_WITHIN_SIZE: > - off = round_up(index, HPAGE_PMD_NR); > - i_size = round_up(i_size_read(inode), PAGE_SIZE); > - if (i_size >= HPAGE_PMD_SIZE && > - i_size >> PAGE_SHIFT >= off) > + i_size = i_size_read(inode); > + if (index >= HPAGE_PMD_NR || i_size >= HPAGE_PMD_SIZE) > goto alloc_huge; I said fine in principle above, but when I look at this, I'm puzzled. Certainly the new condition is easier to understand than the old condition: which is a plus, even though it's hackish (I do dislike hobbling the first extent, when it's an incomplete last extent which deserves to be hobbled - easier said than implemented of course). But isn't the new condition (with its ||) always weaker than the old condition (with its &&)? Whereas I thought you were trying to change it to be less keen to allocate hugepages, not more. What the condition ought to say, I don't know: I got too confused, and depressed by my confusion, so I'm just handing it back to you. And then there's the SHMEM_HUGE_WITHIN_SIZE case in shmem_huge_enabled() (for khugepaged), which you have explicitly not changed in this patch: looks strange to me, is it doing the right thing? > /* fallthrough */ > case SHMEM_HUGE_ADVISE: > -- > Kirill A. Shutemov -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@xxxxxxxxx. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@xxxxxxxxx"> email@xxxxxxxxx </a>