Dear Arcangeli,
I think this problem is very much related with
the race condition shown in the below commit.
(e86f15ee64d8, mm: vma_merge: fix vm_page_prot SMP race condition
against rmap_walk)
I checked that
the the thread and its child threads are doing mprotect(PROT_{NONE or
R|W}) things repeatedly
while I didn't reproduce the problem yet.
Do you think this is one of the phenomenon you expected
from the race condition shown in the above commit?
Thanks.
Chulmin Kim
On 09/22/2018 12:01 AM, Chulmin Kim wrote:
Hi all.
I am developing an android smartphone.
I am facing a problem that a thread is looping the page fault routine
forever.
(The kernel version is around v4.4 though it may differ from the
mainline slightly
as the problem occurs in a device being developed in my company.)
The pte corresponding to the fault address is with PTE_PROT_NONE and
!PTE_VALID.
(by the way, the pte is mapped to anon page (ashmem))
The weird thing, in my opinion, is that
the VMA of the fault address is not with PROT_NONE but with PROT_READ
& PROT_WRITE.
So, the page fault routine (handle_pte_fault()) returns 0 and fault
loops forever.
I don't think this is a normal situation.
As I didn't enable NUMA, a pte with PROT_NONE and !PTE_VALID is likely
set by mprotect().
1. mprotect(PROT_NONE) -> vma split & set pte with PROT_NONE
2. mprotect(PROT_READ & WRITE) -> vma merge & revert pte
I suspect that the revert pte in #2 didn't work somehow
but no clue.
I googled and found a similar situation
(http://linux-kernel.2935.n7.nabble.com/pipe-page-fault-oddness-td953839.html)
which is relevant to NUMA and huge pagetable configs
while my device is nothing to do with those configs.
Am I missing any possible scenario? or is it already known BUG?
It will be pleasure if you can give any idea about this problem.
Thanks.
Chulmin Kim