Re: [PATCH v5 06/11] mm: thp: check pmd migration entry in common path

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 




Anshuman Khandual wrote:
> On 04/21/2017 02:17 AM, Zi Yan wrote:
>> From: Zi Yan <zi.yan@xxxxxxxxxxxxxx>
>>
>> If one of callers of page migration starts to handle thp,
>> memory management code start to see pmd migration entry, so we need
>> to prepare for it before enabling. This patch changes various code
>> point which checks the status of given pmds in order to prevent race
>> between thp migration and the pmd-related works.
>>
>> ChangeLog v1 -> v2:
>> - introduce pmd_related() (I know the naming is not good, but can't
>>   think up no better name. Any suggesntion is welcomed.)
>>
>> Signed-off-by: Naoya Horiguchi <n-horiguchi@xxxxxxxxxxxxx>
>>
>> ChangeLog v2 -> v3:
>> - add is_swap_pmd()
>> - a pmd entry should be pmd pointing to pte pages, is_swap_pmd(),
>>   pmd_trans_huge(), pmd_devmap(), or pmd_none()
>> - pmd_none_or_trans_huge_or_clear_bad() and pmd_trans_unstable() return
>>   true on pmd_migration_entry, so that migration entries are not
>>   treated as pmd page table entries.
>>
>> ChangeLog v4 -> v5:
>> - add explanation in pmd_none_or_trans_huge_or_clear_bad() to state
>>   the equivalence of !pmd_present() and is_pmd_migration_entry()
>> - fix migration entry wait deadlock code (from v1) in follow_page_mask()
>> - remove unnecessary code (from v1) in follow_trans_huge_pmd()
>> - use is_swap_pmd() instead of !pmd_present() for pmd migration entry,
>>   so it will not be confused with pmd_none()
>> - change author information
>>
>> Signed-off-by: Zi Yan <zi.yan@xxxxxxxxxxxxxx>
>> ---
>>  arch/x86/mm/gup.c             |  7 +++--
>>  fs/proc/task_mmu.c            | 30 +++++++++++++--------
>>  include/asm-generic/pgtable.h | 17 +++++++++++-
>>  include/linux/huge_mm.h       | 14 ++++++++--
>>  mm/gup.c                      | 22 ++++++++++++++--
>>  mm/huge_memory.c              | 61 ++++++++++++++++++++++++++++++++++++++-----
>>  mm/memcontrol.c               |  5 ++++
>>  mm/memory.c                   | 12 +++++++--
>>  mm/mprotect.c                 |  4 +--
>>  mm/mremap.c                   |  2 +-
>>  10 files changed, 145 insertions(+), 29 deletions(-)
>>
>> diff --git a/arch/x86/mm/gup.c b/arch/x86/mm/gup.c
>> index 456dfdfd2249..096bbcc801e6 100644
>> --- a/arch/x86/mm/gup.c
>> +++ b/arch/x86/mm/gup.c
>> @@ -9,6 +9,7 @@
>>  #include <linux/vmstat.h>
>>  #include <linux/highmem.h>
>>  #include <linux/swap.h>
>> +#include <linux/swapops.h>
>>  #include <linux/memremap.h>
>>  
>>  #include <asm/mmu_context.h>
>> @@ -243,9 +244,11 @@ static int gup_pmd_range(pud_t pud, unsigned long addr, unsigned long end,
>>  		pmd_t pmd = *pmdp;
>>  
>>  		next = pmd_addr_end(addr, end);
>> -		if (pmd_none(pmd))
>> +		if (!pmd_present(pmd)) {
>> +			VM_BUG_ON(is_swap_pmd(pmd) && IS_ENABLED(CONFIG_MIGRATION) &&
>> +					  !is_pmd_migration_entry(pmd));
>>  			return 0;
>> -		if (unlikely(pmd_large(pmd) || !pmd_present(pmd))) {
>> +		} else if (unlikely(pmd_large(pmd))) {
>>  			/*
>>  			 * NUMA hinting faults need to be handled in the GUP
>>  			 * slowpath for accounting purposes and so that they
>> diff --git a/fs/proc/task_mmu.c b/fs/proc/task_mmu.c
>> index 5c8359704601..57489dcd71c4 100644
>> --- a/fs/proc/task_mmu.c
>> +++ b/fs/proc/task_mmu.c
>> @@ -600,7 +600,8 @@ static int smaps_pte_range(pmd_t *pmd, unsigned long addr, unsigned long end,
>>  
>>  	ptl = pmd_trans_huge_lock(pmd, vma);
>>  	if (ptl) {
>> -		smaps_pmd_entry(pmd, addr, walk);
>> +		if (pmd_present(*pmd))
>> +			smaps_pmd_entry(pmd, addr, walk);
>>  		spin_unlock(ptl);
>>  		return 0;
>>  	}
>> @@ -942,6 +943,9 @@ static int clear_refs_pte_range(pmd_t *pmd, unsigned long addr,
>>  			goto out;
>>  		}
>>  
>> +		if (!pmd_present(*pmd))
>> +			goto out;
>> +
> 
> These pmd_present() checks should have been done irrespective of the
> presence of new PMD migration entries. Please separate them out in a
> different clean up patch.

Not really. The introduction of PMD migration entries makes
pmd_trans_huge_lock() return a lock when PMD is a swap entry (See
changes on pmd_trans_huge_lock() in this patch). This was not the case
before, where pmd_trans_huge_lock() returned NULL if PMD entry was
pmd_none() and both two chunks were not reachable.

Maybe I should use is_swap_pmd() to clarify the confusion.

<snip>

>> diff --git a/mm/huge_memory.c b/mm/huge_memory.c
>> index 7406d88445bf..3479e9caf2fa 100644
>> --- a/mm/huge_memory.c
>> +++ b/mm/huge_memory.c
>> @@ -912,6 +912,22 @@ int copy_huge_pmd(struct mm_struct *dst_mm, struct mm_struct *src_mm,
>>  
>>  	ret = -EAGAIN;
>>  	pmd = *src_pmd;
>> +
>> +	if (unlikely(is_swap_pmd(pmd))) {
>> +		swp_entry_t entry = pmd_to_swp_entry(pmd);
>> +
>> +		VM_BUG_ON(IS_ENABLED(CONFIG_MIGRATION) &&
>> +				  !is_pmd_migration_entry(pmd));
>> +		if (is_write_migration_entry(entry)) {
>> +			make_migration_entry_read(&entry);
> 
> We create a read migration entry after detecting a write ?

When copying page tables, COW mappings require pages in both parent and
child to be set to read. In copy_huge_pmd(), only anonymous VMAs are
copied and the other VMAs will be refilled on fault. Writable anonymous
VMAs have VM_MAYWRITE set but not VM_SHARED and this matches
is_cow_mapping(). So all mappings copied in this function are COW mappings.

> 
>> +			pmd = swp_entry_to_pmd(entry);
>> +			set_pmd_at(src_mm, addr, src_pmd, pmd);
>> +		}
>> +		set_pmd_at(dst_mm, addr, dst_pmd, pmd);
>> +		ret = 0;
>> +		goto out_unlock;
>> +	}
>> +
>>  	if (unlikely(!pmd_trans_huge(pmd))) {
>>  		pte_free(dst_mm, pgtable);
>>  		goto out_unlock;
>> @@ -1218,6 +1234,9 @@ int do_huge_pmd_wp_page(struct vm_fault *vmf, pmd_t orig_pmd)
>>  	if (unlikely(!pmd_same(*vmf->pmd, orig_pmd)))
>>  		goto out_unlock;
>>  
>> +	if (unlikely(!pmd_present(orig_pmd)))
>> +		goto out_unlock;
>> +
>>  	page = pmd_page(orig_pmd);
>>  	VM_BUG_ON_PAGE(!PageCompound(page) || !PageHead(page), page);
>>  	/*
>> @@ -1548,6 +1567,12 @@ bool madvise_free_huge_pmd(struct mmu_gather *tlb, struct vm_area_struct *vma,
>>  	if (is_huge_zero_pmd(orig_pmd))
>>  		goto out;
>>  
>> +	if (unlikely(!pmd_present(orig_pmd))) {
>> +		VM_BUG_ON(IS_ENABLED(CONFIG_MIGRATION) &&
>> +				  !is_pmd_migration_entry(orig_pmd));
>> +		goto out;
>> +	}
>> +
>>  	page = pmd_page(orig_pmd);
>>  	/*
>>  	 * If other processes are mapping this page, we couldn't discard
>> @@ -1758,6 +1783,21 @@ int change_huge_pmd(struct vm_area_struct *vma, pmd_t *pmd,
>>  	preserve_write = prot_numa && pmd_write(*pmd);
>>  	ret = 1;
>>  
>> +	if (is_swap_pmd(*pmd)) {
>> +		swp_entry_t entry = pmd_to_swp_entry(*pmd);
>> +
>> +		VM_BUG_ON(IS_ENABLED(CONFIG_MIGRATION) &&
>> +				  !is_pmd_migration_entry(*pmd));
>> +		if (is_write_migration_entry(entry)) {
>> +			pmd_t newpmd;
>> +
>> +			make_migration_entry_read(&entry);
> 
> Same here or maybe I am missing something.


I follow the same pattern in change_pte_range() (mm/mprotect.c). The
comment there says "A protection check is difficult so just be safe and
disable write".

-- 
Best Regards,
Yan Zi

Attachment: signature.asc
Description: OpenPGP digital signature


[Index of Archives]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Bugtraq]     [Linux OMAP]     [Linux MIPS]     [eCos]     [Asterisk Internet PBX]     [Linux API]
  Powered by Linux