Re: [PATCH 06/33] autonuma: teach gup_fast about pmd_numa

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Thu, Oct 04, 2012 at 01:50:48AM +0200, Andrea Arcangeli wrote:
> In the special "pmd" mode of knuma_scand
> (/sys/kernel/mm/autonuma/knuma_scand/pmd == 1), the pmd may be of numa
> type (_PAGE_PRESENT not set), however the pte might be
> present. Therefore, gup_pmd_range() must return 0 in this case to
> avoid losing a NUMA hinting page fault during gup_fast.
> 

So if gup_fast fails, presumably we fall back to taking the mmap_sem and
calling get_user_pages(). This is a heavier operation and I wonder if the
cost is justified. i.e. Is the performance loss from using get_user_pages()
offset by improved NUMA placement? I ask because we always incur the cost of
taking mmap_sem but only sometimes get it back from improved NUMA placement.
How bad would it be if gup_fast lost some of the NUMA hinting information?

> Note: gup_fast will skip over non present ptes (like numa types), so
> no explicit check is needed for the pte_numa case. gup_fast will also
> skip over THP when the trans huge pmd is non present. So, the pmd_numa
> case will also be correctly skipped with no additional code changes
> required.
> 
> Acked-by: Rik van Riel <riel@xxxxxxxxxx>
> Signed-off-by: Andrea Arcangeli <aarcange@xxxxxxxxxx>
> ---
>  arch/x86/mm/gup.c |   13 ++++++++++++-
>  1 files changed, 12 insertions(+), 1 deletions(-)
> 
> diff --git a/arch/x86/mm/gup.c b/arch/x86/mm/gup.c
> index 6dc9921..cad7d97 100644
> --- a/arch/x86/mm/gup.c
> +++ b/arch/x86/mm/gup.c
> @@ -169,8 +169,19 @@ static int gup_pmd_range(pud_t pud, unsigned long addr, unsigned long end,
>  		 * can't because it has irq disabled and
>  		 * wait_split_huge_page() would never return as the
>  		 * tlb flush IPI wouldn't run.
> +		 *
> +		 * The pmd_numa() check is needed because the code
> +		 * doesn't check the _PAGE_PRESENT bit of the pmd if
> +		 * the gup_pte_range() path is taken. NOTE: not all
> +		 * gup_fast users will will access the page contents
> +		 * using the CPU through the NUMA memory channels like
> +		 * KVM does. So we're forced to trigger NUMA hinting
> +		 * page faults unconditionally for all gup_fast users
> +		 * even though NUMA hinting page faults aren't useful
> +		 * to I/O drivers that will access the page with DMA
> +		 * and not with the CPU.
>  		 */
> -		if (pmd_none(pmd) || pmd_trans_splitting(pmd))
> +		if (pmd_none(pmd) || pmd_trans_splitting(pmd) || pmd_numa(pmd))
>  			return 0;
>  		if (unlikely(pmd_large(pmd))) {
>  			if (!gup_huge_pmd(pmd, addr, next, write, pages, nr))
> 

-- 
Mel Gorman
SUSE Labs

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@xxxxxxxxx.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@xxxxxxxxx";> email@xxxxxxxxx </a>


[Index of Archives]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Bugtraq]     [Linux]     [Linux OMAP]     [Linux MIPS]     [ECOS]     [Asterisk Internet PBX]     [Linux API]