Re: [PATCH] thp: close race between split and zap huge pages

Andrew Morton <akpm@xxxxxxxxxxxxxxxxxxxx> · Wed, 16 Apr 2014 13:19:42 -0700

On Wed, 16 Apr 2014 00:48:35 +0300 "Kirill A. Shutemov" <kirill.shutemov@xxxxxxxxxxxxxxx> wrote:

> Sasha Levin has reported two THP BUGs[1][2]. I believe both of them have
> the same root cause. Let's look to them one by one.
> 
> The first bug[1] is "kernel BUG at mm/huge_memory.c:1829!".
> It's BUG_ON(mapcount != page_mapcount(page)) in __split_huge_page().
> >From my testing I see that page_mapcount() is higher than mapcount here.
> 
> I think it happens due to race between zap_huge_pmd() and
> page_check_address_pmd(). page_check_address_pmd() misses PMD
> which is under zap:

Why did this bug happen?

In other words, what earlier mistakes had we made which led to you
getting this locking wrong?  

Based on that knowledge, what can we do to reduce the likelihood of
such mistakes being made in the future?  (Hint: the answer to this
will involve making changes to this patch).

> --- a/mm/huge_memory.c
> +++ b/mm/huge_memory.c
> @@ -1536,16 +1536,23 @@ pmd_t *page_check_address_pmd(struct page *page,
>  			      enum page_check_address_pmd_flag flag,
>  			      spinlock_t **ptl)
>  {
> +	pgd_t *pgd;
> +	pud_t *pud;
>  	pmd_t *pmd;
>  
>  	if (address & ~HPAGE_PMD_MASK)
>  		return NULL;
>  
> -	pmd = mm_find_pmd(mm, address);
> -	if (!pmd)
> +	pgd = pgd_offset(mm, address);
> +	if (!pgd_present(*pgd))
>  		return NULL;
> +	pud = pud_offset(pgd, address);
> +	if (!pud_present(*pud))
> +		return NULL;
> +	pmd = pmd_offset(pud, address);
> +
>  	*ptl = pmd_lock(mm, pmd);
> -	if (pmd_none(*pmd))
> +	if (!pmd_present(*pmd))
>  		goto unlock;
>  	if (pmd_page(*pmd) != page)
>  		goto unlock;

So how do other callers of mm_find_pmd() manage to avoid this race, or
are they all buggy?

Is mm_find_pmd() really so simple and obvious that we can afford to
leave it undocumented?

--
To unsubscribe from this list: send the line "unsubscribe stable" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html