Re: [patch 019/154] mm: make madvise(MADV_WILLNEED) support swap file prefetch

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Fri, Dec 20, 2013 at 06:42:10PM +0100, Andrea Arcangeli wrote:
> > <SNIP>
> > diff --git a/include/asm-generic/pgtable.h b/include/asm-generic/pgtable.h
> > index f330d28e4d0e..1f8bc7881bdb 100644
> > --- a/include/asm-generic/pgtable.h
> > +++ b/include/asm-generic/pgtable.h
> > @@ -558,6 +558,14 @@ static inline pmd_t pmd_read_atomic(pmd_t *pmdp)
> >  }
> >  #endif
> >  
> > +#ifndef pmd_numa
> > +static inline int pmd_numa(pmd_t pmd)
> > +{
> > +	return (pmd_flags(pmd) &
> > +		(_PAGE_NUMA|_PAGE_PRESENT)) == _PAGE_NUMA;
> > +}
> > +#endif
> > +
> >  /*
> >   * This function is meant to be used by sites walking pagetables with
> >   * the mmap_sem hold in read mode to protect against MADV_DONTNEED and
> > @@ -601,7 +609,7 @@ static inline int pmd_none_or_trans_huge_or_clear_bad(pmd_t *pmd)
> >  #endif
> >  	if (pmd_none(pmdval))
> >  		return 1;
> > -	if (unlikely(pmd_bad(pmdval))) {
> > +	if (unlikely(pmd_bad(pmdval) || pmd_numa(pmdval))) {
> >  		if (!pmd_trans_huge(pmdval))
> >  			pmd_clear_bad(pmd);
> >  		return 1;
> > @@ -650,14 +658,6 @@ static inline int pte_numa(pte_t pte)
> >  }
> >  #endif
> >  
> > -#ifndef pmd_numa
> > -static inline int pmd_numa(pmd_t pmd)
> > -{
> > -	return (pmd_flags(pmd) &
> > -		(_PAGE_NUMA|_PAGE_PRESENT)) == _PAGE_NUMA;
> > -}
> > -#endif
> > -
> 
> If we should do the below I would suggest Mel to decide.
> 
> diff --git a/arch/x86/include/asm/pgtable.h b/arch/x86/include/asm/pgtable.h
> index 3d19994..d70235b 100644
> --- a/arch/x86/include/asm/pgtable.h
> +++ b/arch/x86/include/asm/pgtable.h
> @@ -532,8 +532,10 @@ static inline int pmd_bad(pmd_t pmd)
>  {
>  #ifdef CONFIG_NUMA_BALANCING
>  	/* pmd_numa check */
> -	if ((pmd_flags(pmd) & (_PAGE_NUMA|_PAGE_PRESENT)) == _PAGE_NUMA)
> -		return 0;
> +	if ((pmd_flags(pmd) & (_PAGE_NUMA|_PAGE_PRESENT)) == _PAGE_NUMA) {
> +		pmd_clear_flags(pmd, _PAGE_NUMA);
> +		pmd_set_flags(pmd, _PAGE_PRESENT);
> +	}
>  #endif
>  	return (pmd_flags(pmd) & ~_PAGE_USER) != _KERNPG_TABLE;
>  }
> 

I'm not keen on pmd_bad clearing NUMA hinting information. The name
implies it's a check only and this patch transforms the pmd. It also
puts a new burden on the architecture side that may be easily missed in
the future.

> 
> The reason for not doing this was that it felt slow to have that kind
> of mangling in a fast path bad check. But maybe it's no problem to do
> it. The above should also avoid the bug.
> 

If this bug is related to mprotect then it should also have been fixed
indirectly by commit 1667918b ("mm: numa: clear numa hinting information
on mprotect") although for the wrong reasons. Applying cleanly requires

 mm: clear pmd_numa before invalidating
 mm: numa: do not clear PMD during PTE update scan
 mm: numa: do not clear PTE for pte_numa update
 mm: numa: clear numa hinting information on mprotect

Is this bug reproducible in 3.13-rc5?

> However I would suggest the below, it was overkill strict to depend on
> pmd_bad to fail, and overall it made it weaker to depend on debug
> checks that may change in the future and that can provide false
> negatives (they only should never provide false positives). No need to
> check pmd_trans_huge inside the branch, the pmdval is stable there.
> 
> If in a later patch I'd suggest you to cleanup the barrier with a
> ACCESS_ONCE (ACCESS_ONCE conditional only to
> CONFIG_TRANSPARENT_HUGEPAGE, no need of it if that's not configured
> in) that could give gcc more room for optimizations in this function.
> 
> If CONFIG_TRANSPARENT_HUGEPAGE is not set, if pmd_none fails, the pmd
> cannot change at all.
> 
> Again for the pmd_bad change above it's up to you... it's purely a
> performance/reliability tradeoff.
> 
> From 238436f53937ed0a6821aa380b85366e4f6ad166 Mon Sep 17 00:00:00 2001
> From: Andrea Arcangeli <aarcange@xxxxxxxxxx>
> Date: Fri, 20 Dec 2013 18:24:49 +0100
> Subject: [PATCH] mm: thp: don't depend on pmd_bad to fail on transhuge pmds
> 
> pmd_bad stopped failing on transhuge pmds if the pmd_numa is
> armed. Don't depend on that invariant anymore as it has been broken.
> 
> pmd_bad doesn't actually check if the pmd is bad if the pmd_numa is
> set. An alternative could be to teach it to mangle the local pmd to
> convert it from pmd_numa to non-pmd_numa and then issue a real check
> on it. However that would likely be slower, and keeping the
> pmd_trans_huge check in pmd_none_or_trans_huge_or_clear_bad in an
> unlikely pmd_bad branch was also not optimal.
> 
> Signed-off-by: Andrea Arcangeli <aarcange@xxxxxxxxxx>

I prefer this patch

Acked-by: Mel Gorman <mgorman@xxxxxxx>

-- 
Mel Gorman
SUSE Labs

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@xxxxxxxxx.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@xxxxxxxxx";> email@xxxxxxxxx </a>




[Index of Archives]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Bugtraq]     [Linux]     [Linux OMAP]     [Linux MIPS]     [ECOS]     [Asterisk Internet PBX]     [Linux API]