Re: [PATCH v3 03/14] mm: use pmd lock instead of racy checks in zap_pmd_range()

Zi Yan <zi.yan@xxxxxxxxxxxxxx> · Sun, 12 Feb 2017 18:25:09 -0600

Hi Kirill,

>>>> The crash scenario I guess is like:
>>>> 1. A huge page pmd entry is in the middle of being changed into either a
>>>> pmd_protnone or a pmd_migration_entry. It is cleared to pmd_none.
>>>>
>>>> 2. At the same time, the application frees the vma this page belongs to.
>>>
>>> Em... no.
>>>
>>> This shouldn't be possible: your 1. must be done under down_read(mmap_sem).
>>> And we only be able to remove vma under down_write(mmap_sem), so the
>>> scenario should be excluded.
>>>
>>> What do I miss?
>>
>> You are right. This problem will not happen in the upstream kernel.
>>
>> The problem comes from my customized kernel, where I migrate pages away
>> instead of reclaiming them when memory is under pressure. I did not take
>> any mmap_sem when I migrate pages. So I got this error.
>>
>> It is a false alarm. Sorry about that. Thanks for clarifying the problem.
>
> I think there's still a race between MADV_DONTNEED and
> change_huge_pmd(.prot_numa=1) resulting in skipping THP by
> zap_pmd_range(). It need to be addressed.
>
> And MADV_FREE requires a fix.
>
> So, minus one non-bug, plus two bugs.
>

You said a huge page pmd entry needs to be changed under down_read(mmap_sem).
It is only true for huge pages, right?

Since in mm/compaction.c, the kernel does not down_read(mmap_sem) during memory
compaction. Namely, base page migrations do not hold down_read(mmap_sem),
so in zap_pte_range(), the kernel needs to hold PTE page table locks.
Am I right about this?

If yes. IMHO, ultimately, when we need to compact 2MB pages to form 1GB pages,
in zap_pmd_range(), pmd locks have to be taken to make that kind of compactions
possible.

Do you agree?

--
Best Regards
Yan Zi
Attachment:
signature.asc

Description: OpenPGP digital signature