Dear all,
We have verified using the problem scenario (repeat execution fo android
apps for 2~3 days) that
the problem is gone after applying the commit.
- e86f15ee64d8, mm: vma_merge: fix vm_page_prot SMP race condition
against rmap_walk
Thanks!
Chulmin Kim
On 09/25/2018 06:08 AM, Andrea Arcangeli wrote:
Hello,
On Sat, Sep 22, 2018 at 01:38:07PM +0900, Chulmin Kim wrote:
Dear Arcangeli,
I think this problem is very much related with
the race condition shown in the below commit.
(e86f15ee64d8, mm: vma_merge: fix vm_page_prot SMP race condition
against rmap_walk)
I checked that
the the thread and its child threads are doing mprotect(PROT_{NONE or
R|W}) things repeatedly
while I didn't reproduce the problem yet.
Do you think this is one of the phenomenon you expected
from the race condition shown in the above commit?
Yes that commit will fix your problem in a v4.4 based tree that misses
that fix. You just need to cherry-pick that commit to fix the problem.
Page migrate sets the pte to PROT_NONE by mistake because it runs
concurrently with the mprotect that transitions an adjacent vma from
PROT_NONE to PROT_READ|WRITE. vma_merge (before the fix) temporarily
shown an erratic PROT_NONE vma prot for the virtual range under page
migration.
With NUMA disabled, it's likely compaction that triggered page migrate
for you. Disabling compaction at build time would have likely hidden
the problem. Compaction uses migration and you most certainly have
CONFIG_COMPACTION=y (rightfully so).
On a side note, I suggest to cherry pick the last upstream commit of
mm/vmacache.c too.
Hope this helps,
Andrea