Re: [PATCH 3/3] mm: migrate: Stabilise page count when migrating transparent hugepages

zhong jiang <zhongjiang@xxxxxxxxxx> · Fri, 9 Jun 2017 16:25:26 +0800



On 2017/6/8 20:07, Will Deacon wrote:
> On Thu, Jun 08, 2017 at 12:52:07PM +0200, Vlastimil Babka wrote:
>> On 06/06/2017 07:58 PM, Will Deacon wrote:
>>> When migrating a transparent hugepage, migrate_misplaced_transhuge_page
>>> guards itself against a concurrent fastgup of the page by checking that
>>> the page count is equal to 2 before and after installing the new pmd.
>>>
>>> If the page count changes, then the pmd is reverted back to the original
>>> entry, however there is a small window where the new (possibly writable)
>>> pmd is installed and the underlying page could be written by userspace.
>>> Restoring the old pmd could therefore result in loss of data.
>>>
>>> This patch fixes the problem by freezing the page count whilst updating
>>> the page tables, which protects against a concurrent fastgup without the
>>> need to restore the old pmd in the failure case (since the page count can
>>> no longer change under our feet).
>>>
>>> Cc: Mel Gorman <mgorman@xxxxxxx>
>>> Signed-off-by: Will Deacon <will.deacon@xxxxxxx>
>>> ---
>>>  mm/migrate.c | 15 ++-------------
>>>  1 file changed, 2 insertions(+), 13 deletions(-)
>>>
>>> diff --git a/mm/migrate.c b/mm/migrate.c
>>> index 89a0a1707f4c..8b21f1b1ec6e 100644
>>> --- a/mm/migrate.c
>>> +++ b/mm/migrate.c
>>> @@ -1913,7 +1913,6 @@ int migrate_misplaced_transhuge_page(struct mm_struct *mm,
>>>  	int page_lru = page_is_file_cache(page);
>>>  	unsigned long mmun_start = address & HPAGE_PMD_MASK;
>>>  	unsigned long mmun_end = mmun_start + HPAGE_PMD_SIZE;
>>> -	pmd_t orig_entry;
>>>  
>>>  	/*
>>>  	 * Rate-limit the amount of data that is being migrated to a node.
>>> @@ -1956,8 +1955,7 @@ int migrate_misplaced_transhuge_page(struct mm_struct *mm,
>>>  	/* Recheck the target PMD */
>>>  	mmu_notifier_invalidate_range_start(mm, mmun_start, mmun_end);
>>>  	ptl = pmd_lock(mm, pmd);
>>> -	if (unlikely(!pmd_same(*pmd, entry) || page_count(page) != 2)) {
>>> -fail_putback:
>>> +	if (unlikely(!pmd_same(*pmd, entry) || !page_ref_freeze(page, 2))) {
>>>  		spin_unlock(ptl);
>>>  		mmu_notifier_invalidate_range_end(mm, mmun_start, mmun_end);
>>>  
>>> @@ -1979,7 +1977,6 @@ int migrate_misplaced_transhuge_page(struct mm_struct *mm,
>>>  		goto out_unlock;
>>>  	}
>>>  
>>> -	orig_entry = *pmd;
>>>  	entry = mk_huge_pmd(new_page, vma->vm_page_prot);
>>>  	entry = maybe_pmd_mkwrite(pmd_mkdirty(entry), vma);
>>>  
>>> @@ -1996,15 +1993,7 @@ int migrate_misplaced_transhuge_page(struct mm_struct *mm,
>> There's a comment above this:
>>
>>        /*
>>          * Clear the old entry under pagetable lock and establish the new PTE.
>>          * Any parallel GUP will either observe the old page blocking on the
>>          * page lock, block on the page table lock or observe the new page.
>>          * The SetPageUptodate on the new page and page_add_new_anon_rmap
>>          * guarantee the copy is visible before the pagetable update.
>>          */
>>
>> Is it still correct? Didn't the freezing prevent some of the cases above?
> I don't think the comment needs to change, the freezing is just doing
> correctly what the code tried to do before. Granted, the blocking might come
> about because of the count momentarily being set to 0 (and
> page_cache_add_speculative bailing), but that's just fastGUP implementation
> details, I think.
  The pagetable lock will prevent the fastgup by userspace.  why the race will come across
  I do not unaderstand , can you explain it please?

 Thanks
 zhongjiang
>>>  	set_pmd_at(mm, mmun_start, pmd, entry);
>>>  	update_mmu_cache_pmd(vma, address, &entry);
>>>  
>>> -	if (page_count(page) != 2) {
>> BTW, how did the old code recognize that page count would increase and then
>> decrease back?
> I'm not sure that case matters because the inc/dec would happen before the
> new PMD is put in place (otherwise it wouldn't be reachable via the
> fastGUP).
>
> Will
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majordomo@xxxxxxxxx.  For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: <a href=mailto:"dont@xxxxxxxxx";> email@xxxxxxxxx </a>
>
> .
>


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@xxxxxxxxx.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@xxxxxxxxx";> email@xxxxxxxxx </a>