Re: [PATCHv4] shmem: avoid huge pages for small files

Hugh Dickins <hughd@xxxxxxxxxx> · Fri, 11 Nov 2016 13:41:11 -0800 (PST)

On Thu, 10 Nov 2016, Kirill A. Shutemov wrote:
> On Mon, Nov 07, 2016 at 03:17:11PM -0800, Hugh Dickins wrote:
> 
> > Treating the first extent differently is a hack, and does not respect
> > that this is a filesystem, on which size is likely to increase.
> > 
> > By all means refine the condition for huge=within_size, and by all means
> > warn in transhuge.txt that huge=always may tend to waste valuable huge
> > pages if the filesystem is used for small files without good reason
> 
> Would it be okay, if I just replace huge=within_size logic with what I
> proposed here for huge=always?

In principle yes, that would be fine with me: I just don't care very
much about this option, since we do not force "huge=always" on anyone,
so everyone is free to use it where it's useful, and not where it's not.

But perhaps your aim is to have "huge=within_size" set by default on /tmp,
and so not behave badly there: I'd never aimed for that, and I'm a bit
sceptical about it, but if you can get good enough behaviour out of it
for that, I won't stand in your way.

> 
> That's not what I intended initially for this option, but...
> 
> > (but maybe the implementation needs to reclaim those more effectively).
> 
> It's more about cost of allocation than memory pressure.

Regarding that issue, I think you should reconsider the GFP flags used
in shmem_alloc_hugepage().  GFP flags, and compaction latency avoidance,
have been moving targets over the last year, and I've not rechecked;
but I got the impression that your GFP flags are still asking for the
compaction stalls that are now deprecated on the anon THP fault path?
I repeat, I've not rechecked that before writing, maybe it's a libel!

> 
> -----8<-----
> 
> From 287ab05c09bfd49c7356ca74b6fea36d8131edaf Mon Sep 17 00:00:00 2001
> From: "Kirill A. Shutemov" <kirill.shutemov@xxxxxxxxxxxxxxx>
> Date: Mon, 17 Oct 2016 14:44:47 +0300
> Subject: [PATCH] shmem: avoid huge pages for small files
> 
> Huge pages are detrimental for small file: they causes noticible
> overhead on both allocation performance and memory footprint.
> 
> This patch aimed to address this issue by avoiding huge pages until
> file grown to size of huge page if the filesystem mounted with
> huge=within_size option.
> 
> This would cover most of the cases where huge pages causes regressions
> in performance.

It's not a regression if "huge=always" is worse than "huge=never" in
some cases: just cases where it's better not to mount "huge=always".

> 
> The limit doesn't affect khugepaged behaviour: it still can collapse
> pages based on its settings.
> 
> Signed-off-by: Kirill A. Shutemov <kirill.shutemov@xxxxxxxxxxxxxxx>
> ---
>  Documentation/vm/transhuge.txt | 7 ++++++-
>  mm/shmem.c                     | 6 ++----
>  2 files changed, 8 insertions(+), 5 deletions(-)
> 
> diff --git a/Documentation/vm/transhuge.txt b/Documentation/vm/transhuge.txt
> index 2ec6adb5a4ce..14c911c56f4a 100644
> --- a/Documentation/vm/transhuge.txt
> +++ b/Documentation/vm/transhuge.txt
> @@ -208,11 +208,16 @@ You can control hugepage allocation policy in tmpfs with mount option
>    - "always":
>      Attempt to allocate huge pages every time we need a new page;
>  

Nit: please change the semi-colon to full-stop, and delete the blank line.

> +    This option can lead to significant overhead if filesystem is used to
> +    store small files.
> +
>    - "never":
>      Do not allocate huge pages;
>  
>    - "within_size":
> -    Only allocate huge page if it will be fully within i_size.
> +    Only allocate huge page if size of the file more than size of huge
> +    page. This helps to avoid overhead for small files.
> +
>      Also respect fadvise()/madvise() hints;
>  
>    - "advise:
> diff --git a/mm/shmem.c b/mm/shmem.c
> index ad7813d73ea7..3589d36c7c63 100644
> --- a/mm/shmem.c
> +++ b/mm/shmem.c
> @@ -1681,10 +1681,8 @@ static int shmem_getpage_gfp(struct inode *inode, pgoff_t index,
>  		case SHMEM_HUGE_NEVER:
>  			goto alloc_nohuge;
>  		case SHMEM_HUGE_WITHIN_SIZE:
> -			off = round_up(index, HPAGE_PMD_NR);
> -			i_size = round_up(i_size_read(inode), PAGE_SIZE);
> -			if (i_size >= HPAGE_PMD_SIZE &&
> -					i_size >> PAGE_SHIFT >= off)
> +			i_size = i_size_read(inode);
> +			if (index >= HPAGE_PMD_NR || i_size >= HPAGE_PMD_SIZE)
>  				goto alloc_huge;

I said fine in principle above, but when I look at this, I'm puzzled.

Certainly the new condition is easier to understand than the old condition:
which is a plus, even though it's hackish (I do dislike hobbling the first
extent, when it's an incomplete last extent which deserves to be hobbled -
easier said than implemented of course).

But isn't the new condition (with its ||) always weaker than the old
condition (with its &&)?  Whereas I thought you were trying to change
it to be less keen to allocate hugepages, not more.

What the condition ought to say, I don't know: I got too confused,
and depressed by my confusion, so I'm just handing it back to you.

And then there's the SHMEM_HUGE_WITHIN_SIZE case in shmem_huge_enabled()
(for khugepaged), which you have explicitly not changed in this patch:
looks strange to me, is it doing the right thing?

>  			/* fallthrough */
>  		case SHMEM_HUGE_ADVISE:
> -- 
>  Kirill A. Shutemov

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@xxxxxxxxx.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@xxxxxxxxx";> email@xxxxxxxxx </a>