Report A bug of PTE attribute set for mprotect

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Dear All:
 
Sorry for disturbing all of you. Recently I was traped by a memory issue, and finally I find it may be sync problems of mprotect sys call.
 
Background of this problem:
Our kernel version is 3.14.29, When playing Netflix and switch with other apps on Android platform for stress test, after a whole day test, Netflix become no response. Using top command we find it took 25% CPU usage on a 4x cortex-a53 cpu platform. Which means that one cpu was 100% occupied by netfilx. We have tried to debug it and found over 150000 page fault happen in binder thread in one seconds. And fault addresses were always same. Finally we found that attribute of PTE for this address was not match property of it's VMA flags:
 
Log:
[00:11:40][ 8025.477715@1] f176f000-f1773000 r-xp 00000000 00:00 0 
[00:11:40][ 8025.478935@1] (1)[8223:Binder:269_3]pgd = ffffffc034007000
[00:11:40][ 8025.480320@1] [f17701bc] pgd:ffffffc034007018, *pgd=000000002dbcd003, pmd:ffffffc02dbcdc58, *pmd=000000003408b003, pte:ffffffc03408bb80, *pte=006000001e270fd3
 
Property of this vma was R+X, but PTE for fault address did not have execuate permission, because BIT54(PTE_UXN ) was set. So MMU gave permmsion fault for this address but fault handler of arm64 default code did not check this problem, it found that this address had a non-zero pte and other flags seems ok, so handler did nothing and return to user space to keep execute on this page. And that caused page fault happen again and again, with 100% CPU.
 
After many times of debug we got a log which show almost full steps of this scenario(debug patch see "60fead12.diff.zip" in attach):
1. Bad pte was at address ffffffc034155eb8. First time set pte in this address was here(see around line 10271 in attached log file), from an empty value 0 to a write pte for virtual address e91d7000, at this time vma range was e91d7000-e91d8000, call stack show it was a normal page fault.
[18297.354385@2] !(2)[6576:Binder:272_5]set pte at ffffffc034155eb8, e91d7000, 0->2e00000000acf53, ip:handle_mm_fault, mm:ffffffc034a3ea00
[18297.357234@2] !(2)[6576:Binder:272_5]vma:ffffffc0074b8d48 f:80100072 (null),[e91d7000-e91d8000] p:0, fc:1, ip:mmap_region
[18297.360162@2] !(2)[6576:Binder:272_5]Call trace:
[18297.361278@2] !(2)[6576:Binder:272_5][<ffffffc0010894a4>] dump_backtrace+0x0/0x174
[18297.363105@2] !(2)[6576:Binder:272_5][<ffffffc001089634>] show_stack+0x1c/0x28
[18297.364898@2] !(2)[6576:Binder:272_5][<ffffffc001096630>] set_pte+0xe8/0x124
[18297.366590@2] !(2)[6576:Binder:272_5][<ffffffc00119889c>] handle_mm_fault+0x74c/0xc74
[18297.368534@2] !(2)[6576:Binder:272_5][<ffffffc001096b58>] do_page_fault+0x220/0x40c
[18297.370370@2] !(2)[6576:Binder:272_5][<ffffffc0010812cc>] do_mem_abort+0x4c/0xb0
 
2. Then mprotect syscall to change it from write page to R+X page, pte value also change to execute attribute, with pte bit 54 PTE_UXN cleared, during mprotect, this vma range [e91d7000 - e91d8000] was merged with range  [e91d8000 - e91db000] to [e91d7000 - e91db000], vma is R+X property:
[18297.398925@2] (2)[6576:Binder:272_5]mprotect_fixup, prop range[e91d7000-e91d8000], for vma:ffffffc0074b8d48,[e91d7000-e91db000], flag:80100072->80100075, fc:1, ip:mmap_region, mm:ffffffc034a3ea00
[18297.403127@2] !(2)[6576:Binder:272_5]set pte at ffffffc034155eb8, e91d7000, 2e00000000acf53->0, ip:change_protection_range, mm:ffffffc034a3ea00
[18297.406262@2] !(2)[6576:Binder:272_5]vma:ffffffc0074b8d48 f:80100075 (null),[e91d7000-e91db000] p:1, fc:1, ip:mmap_region
[18297.409140@2] !(2)[6576:Binder:272_5]set pte at ffffffc034155eb8, e91d7000, 0->a00000000acfd3, ip:change_protection_range, mm:ffffffc034a3ea00
[18297.412266@2] !(2)[6576:Binder:272_5]vma:ffffffc0074b8d48 f:80100075 (null),[e91d7000-e91db000] p:1, fc:1, ip:mmap_region
[18297.415141@2] !(2)[6576:Binder:272_5]change_pte_range 79, vma:ffffffc0074b8d48, modify e91d7000, pte:ffffffc034155eb8, 2e00000000acf53->a00000000acf53
 
3. Then another page fault happened for virtual address e91d5000, it's vma is ffffffc006c05190, range [e91d5000-e91d6000], only 4KB, with write property:
[18297.815796@1] !(1)[6576:Binder:272_5]set pte at ffffffc034155ea8, e91d5000, 0->2e000001c9f8f53, ip:handle_mm_fault, mm:ffffffc034a3ea00
[18297.818694@1] !(1)[6576:Binder:272_5]vma:ffffffc006c05190 f:80100072 (null),[e91d5000-e91d6000] p:0, fc:1, ip:mmap_region
[18297.821521@1] !(1)[6576:Binder:272_5]Call trace:
[18297.822635@1] !(1)[6576:Binder:272_5][<ffffffc0010894a4>] dump_backtrace+0x0/0x174
[18297.824474@1] !(1)[6576:Binder:272_5][<ffffffc001089634>] show_stack+0x1c/0x28
[18297.826210@1] !(1)[6576:Binder:272_5][<ffffffc001096630>] set_pte+0xe8/0x124
[18297.827955@1] !(1)[6576:Binder:272_5][<ffffffc00119889c>] handle_mm_fault+0x74c/0xc74
[18297.829846@1] !(1)[6576:Binder:272_5][<ffffffc001096b58>] do_page_fault+0x220/0x40c
[18297.831774@1] !(1)[6576:Binder:272_5][<ffffffc0010812cc>] do_mem_abort+0x4c/0xb0
 
4. Then pte at address ffffffc034155eb8 was modified by migration process. Migrate function first unmap vma with this pte, in function try_to_unmap, pte value was set from nomal R+X value to an swap entry. Ummaped vma was ffffffc006c05000, range [e91d6000-e91db000] with R+X property. Virtual address range for this migrate process is e91d7000.
Problem is in last step of fucntion remove_migration_pte, it find vma ffffffc006c05190, range [e91d5000-e91db000](24KB) with write-only proptery for virtual adress e91d7000. Which was not same as in try_to_unmap function. In step 3 we know that vma ffffffc006c05190 is only have 4KB range [e91d5000-e91d6000], with write-only property. But at here it grown up to e91db000. Before remove_migration_pte function vma ffffffc006c05000 holds vma range [e91d6000-e91db000] with R+X property. 
After function remove_migration_pte, pte value at address ffffffc034155eb8 was changed to 6000000f9c6bd3, bit 54 PTE_UXN was set and it can't execute.
 
[18297.860182@3] !(3)[6396:ce.core.androi]set pte at ffffffc034155eb8, e91d7000, a00000000acfd3->0, ip:ptep_clear_flush, mm:ffffffc034a3ea00
[18297.863204@3] !(3)[6396:ce.core.androi]vma:ffffffc006c05000 f:80100075 (null),[e91d6000-e91db000] p:1, fc:1, ip:mmap_region
[18297.866143@3] !(3)[6396:ce.core.androi]set pte at ffffffc034155eb8, e91d7000, 0->158f0, ip:try_to_unmap_one, mm:ffffffc034a3ea00
[18297.868963@3] !(3)[6396:ce.core.androi]vma:ffffffc006c05000 f:80100075 (null),[e91d6000-e91db000] p:1, fc:1, ip:mmap_region
~~~~ These two steps were ok, it find right vma and right range, pte attribute were ok.
[18297.874197@3] !(3)[6396:ce.core.androi]set pte at ffffffc034155eb8, e91d7000, 158f0->6000000f9c6bd3, ip:remove_migration_pte, mm:ffffffc034a3ea00
[18297.876036@3] !(3)[6396:ce.core.androi]vma:ffffffc006c05190 f:80100072 (null),[e91d5000-e91db000] p:0, fc:1, ip:mmap_region
[18297.878995@3] !(3)[6396:ce.core.androi]Call trace:
[18297.880134@3] !(3)[6396:ce.core.androi][<ffffffc0010894a4>] dump_backtrace+0x0/0x174
[18297.882062@3] !(3)[6396:ce.core.androi][<ffffffc001089634>] show_stack+0x1c/0x28
[18297.883829@3] !(3)[6396:ce.core.androi][<ffffffc001096630>] set_pte+0xe8/0x124
[18297.885632@3] !(3)[6396:ce.core.androi][<ffffffc0011b966c>] remove_migration_pte+0x198/0x274
[18297.887682@3] !(3)[6396:ce.core.androi][<ffffffc0011a2c68>] rmap_walk+0x1fc/0x2bc
[18297.889483@3] !(3)[6396:ce.core.androi][<ffffffc0011b94c4>] remove_migration_ptes+0x48/0x58
[18297.891557@3] !(3)[6396:ce.core.androi][<ffffffc0011ba774>] move_to_new_page+0x1a4/0x27c
[18297.893498@3] !(3)[6396:ce.core.androi][<ffffffc0011baf78>] migrate_pages+0x574/0x648
[18297.895442@3] !(3)[6396:ce.core.androi][<ffffffc0011910e0>] compact_zone+0x380/0x4c4
[18297.897300@3] !(3)[6396:ce.core.androi][<ffffffc0011912f0>] compact_zone_order+0xcc/0x110
[18297.899332@3] !(3)[6396:ce.core.androi][<ffffffc00119160c>] try_to_compact_pages+0x108/0x1b8
[18297.901390@3] !(3)[6396:ce.core.androi][<ffffffc001174ab4>] __alloc_pages_direct_compact+0x8c/0x260
[18297.903571@3] !(3)[6396:ce.core.androi][<ffffffc001175298>] __alloc_pages_nodemask+0x610/0x978
[18297.905719@3] !(3)[6396:ce.core.androi][<ffffffc00109db20>] copy_process.part.49+0x12c/0x14a8
[18297.907795@3] !(3)[6396:ce.core.androi][<ffffffc00109f05c>] do_fork+0xec/0x3b0
[18297.909520@3] !(3)[6396:ce.core.androi][<ffffffc00109f40c>] SyS_clone+0x3c/0x4c
 
5.Just after this call stack an mprotect sys call continue execute to change vma ffffffc006c05190, range[e91d5000-e91d6000] from write only to R+X:
[18297.914242@1]  (1)[6576:Binder:272_5]mprotect_fixup, prop range[e91d5000-e91d6000], for vma:ffffffc006c05190,[e91d5000-e91db000], flag:80100072->80100075, fc:1, ip:mmap_region, mm:ffffffc034a3ea00
[18297.917205@1] !(1)[6576:Binder:272_5]set pte at ffffffc034155ea8, e91d5000, 2e000001c9f8f53->0, ip:change_protection_range, mm:ffffffc034a3ea00
[18297.920369@1] !(1)[6576:Binder:272_5]vma:ffffffc006c05190 f:80100075           (null),[e91d5000-e91db000] p:1, fc:1, ip:mmap_region
[18297.923277@1] !(1)[6576:Binder:272_5]set pte at ffffffc034155ea8, e91d5000, 0->a000001c9f8fd3, ip:change_protection_range, mm:ffffffc034a3ea00
[18297.926367@1] !(1)[6576:Binder:272_5]vma:ffffffc006c05190 f:80100075           (null),[e91d5000-e91db000] p:1, fc:1, ip:mmap_region
[18297.929244@1] !(1)[6576:Binder:272_5]change_pte_range 79, vma:ffffffc006c05190, modify e91d5000, pte:ffffffc034155ea8, 2e000001c9f8f53->a000001c9f8f53
[18297.932559@1] !(1)[6576:Binder:272_5]change_pte_range,[e91d6000-e91d6000] vma ffffffc006c05190, changed page:1, ptes:1, empty:0, fc:1
[18297.935498@1]  (1)[6576:Binder:272_5]SYSC_mprotect, vma:ffffffc006c05190[e91d5000-e91db000], tmp:e91d6000, start:e91d5000, prot:5, len:4096, prev:ffffffc006c05190
 
6. Finally application jump back to run at virtual adddress e91d70bc, but it's pte value is wrong, so page fault happen again and again.
[18298.889388@1] !(1)[6576:Binder:272_5]set pte at ffffffc034155eb8, e91d7000, 6000000f9c6bd3->6000000f9c6fd3, ip:ptep_set_access_flags, mm:ffffffc034a3ea00
[18298.892664@1] !(1)[6576:Binder:272_5]vma:ffffffc006c05bb8 f:80100075 (null),[e91d7000-e91db000] p:1, fc:2, ip:mmap_region
[18298.895511@1] !(1)[6576:Binder:272_5]Call trace:
[18298.896614@1] !(1)[6576:Binder:272_5][<ffffffc0010894a4>] dump_backtrace+0x0/0x174
[18298.898505@1] !(1)[6576:Binder:272_5][<ffffffc001089634>] show_stack+0x1c/0x28
[18298.900224@1] !(1)[6576:Binder:272_5][<ffffffc001096630>] set_pte+0xe8/0x124
[18298.901978@1] !(1)[6576:Binder:272_5][<ffffffc0011a6fe4>] ptep_set_access_flags+0x5c/0xd0
[18298.903978@1] !(1)[6576:Binder:272_5][<ffffffc001198578>] handle_mm_fault+0x428/0xc74
[18298.905890@1] !(1)[6576:Binder:272_5][<ffffffc001096b58>] do_page_fault+0x220/0x40c
[18298.907791@1] !(1)[6576:Binder:272_5][<ffffffc0010812cc>] do_mem_abort+0x4c/0xb0
[18298.909529@1] !(1)[6576:Binder:272_5]Exception stack(0xffffffc003797e30 to 0xffffffc003797f50)
[18298.911667@1] !(1)[6576:Binder:272_5]7e20: 00000000 00000000 00000000 00000000
[18298.914231@1] !(1)[6576:Binder:272_5]7e40: ffffffff ffffffff e91d70bc 00000000 34a3ea60 ffffffc0 01103fcc ffffffc0
[18298.916692@1] !(1)[6576:Binder:272_5]7e60: 03797ea0 ffffffc0 0119eb04 ffffffc0 e91d6000 00000000 00000000 00000000
[18298.919279@1] !(1)[6576:Binder:272_5]7e80: ffffffff ffffffff 0119eaf8 ffffffc0 0108e000 ffffffc0 000001f0 00000000
[18298.921796@1] !(1)[6576:Binder:272_5]7ea0: 00000000 00000000 0108426c ffffffc0 00000000 00000000 e91d5000 00000000
[18298.924328@1] !(1)[6576:Binder:272_5]7ec0: 00000000 00000000 000001f0 00000000 53266ef5 00000000 53266ef5 00000000
[18298.926805@1] !(1)[6576:Binder:272_5]7ee0: e73ff260 00000000 e7e9f525 00000000 e82968f8 00000000 e7e9f371 00000000
[18298.929396@1] !(1)[6576:Binder:272_5]7f00: 00000320 00000000 00000003 00000000 e73ff27c 00000000 00000000 00000000
[18298.931909@1] !(1)[6576:Binder:272_5]7f20: e73ff11c 00000000 e7effde0 00000000 e91d6000 00000000 e73ff108 00000000
[18298.934441@1] !(1)[6576:Binder:272_5]7f40: e91d70bd 00000000 00000000 00000000
[18298.936280@1] (1)[6576:Binder:272_5]check_pte 3832, pte:ffffffc034155eb8, 6000000f9c6fd3, 6000000f9c6fd3 vma:ffffffc006c05bb8, e91d7000 e91db000, addr:e91d7000, fc:2
[18298.939893@1] (1)[6576:Binder:272_5]check_pte 3832, ip:mmap_region, pte attr missmatch, back up pte:
[18298.942126@1] (1)[6576:Binder:272_5]ffffffc034155e98: 0000000000000000 0000000000000000 0000000000000000 0000000000000000
[18298.944794@1] (1)[6576:Binder:272_5]ffffffc034155eb8: 006000000f9c6bd3 00a0000031445fd3 00200000099edbd3 0060000023612bd3
 
Following picture show situation around step 4: vma@ffffffc006c05190 has range e91d5000-e91d6000, vma@ffffffc006c05000 has range e91d6000 - e91db000, they were neighbour, but property not same.
At this time mprtoect syscall change vma@ffffffc006c05190 from write only to R+X, so it can be merged with vma@ffffffc006c05000, but during merge process a migration of virtual address e91d7000 happended, and at last step of migration process, it find vma@ffffffc006c05190 with property W(NOT R+X) and set this property for it's PTE. That caused the problem.
 
I reviewed code on 3.14.29 kernel of mprotect function and migrate functin, found that there was a race condtion during vma merge:
mm/mprotect.c
mprotect()
down_write(&current->mm->mmap_sem);
...
mprotect_fixup()
254     *pprev = vma_merge(mm, *pprev, start, end, newflags,
255             vma->anon_vma, vma->vm_file, pgoff, vma_policy(vma),
256             vma_get_anon_name(vma));
257     if (*pprev) {
258         vma = *pprev;
259         goto success;
260     }
261
262     *pprev = vma;
263
264     if (start != vma->vm_start) {
265         error = split_vma(mm, vma, start, 1);
266         if (error)
267             goto fail;
268     }
269
270     if (end != vma->vm_end) {
271         error = split_vma(mm, vma, end, 0);
272         if (error)
273             goto fail;
274     }
275
276 success:
277     /*
278      * vm_flags and vm_page_prot are protected by the mmap_sem
279      * held in write mode.
280      */
281     vma->vm_flags = newflags;
282     vma->vm_page_prot = pgprot_modify(vma->vm_page_prot,
283                       vm_get_page_prot(newflags));
284
285     if (vma_wants_writenotify(vma)) {
286         vma->vm_page_prot = vm_get_page_prot(newflags & ~VM_SHARED);
287         dirty_accountable = 1;
288     }
289
290     change_protection(vma, start, end, vma->vm_page_prot,
291               dirty_accountable, 0);
...
up_write(&current->mm->mmap_sem);
 
mm/migrate.c
move_to_new_page()
...
remove_migration_ptes()
rmap_walk_anon()
rmap_walk_anon_lock()
down_read(&anon_vma->root->rwsem);
remove_migration_pte()
    223     get_page(new);
 224     pte = pte_mkold(mk_pte(new, vma->vm_page_prot));
 225     if (pte_swp_soft_dirty(*ptep))
 226         pte = pte_mksoft_dirty(pte);
 227
 228     /* Recheck VMA as permissions can change since migration started  */
 229     if (is_write_migration_entry(entry))
 230         pte = maybe_mkwrite(pte, vma);
 231
 232 #ifdef CONFIG_HUGETLB_PAGE
 233     if (PageHuge(new)) {
 234         pte = pte_mkhuge(pte);
 235         pte = arch_make_huge_pte(pte, vma, new, 0);
 236     }
 237 #endif
 238     flush_dcache_page(new);
 239     set_pte_at(mm, addr, ptep, pte);
 240
anon_vma_unlock_read
up_read(&anon_vma->root->rwsem);
...
 
During mprotect syscall, it hold rw-lock of mm->mmap_sem, then try to merge vma with nearby same vma. But if it just merged vma at line 257 in mm/mprotect.c and interrupted by other tasks, at this time this vma has a new vma rage but vm_page_prot did not changed(still keep old value). If at this time other task triggered a migration process and just migrate page of this new vma, in finction remove_migration_pte it hold anon_vma's root rwsem lock and get this vma's old vm_page_prot and set it to it's PTE. Then mprotect resume to running and this vma's vm_page_prot changed at line 282 in mm/mprotect.c. It's too late for migration process. And step change_protection only changed ptes in range with input parameter, not full range of this VMA. So the wrong pte value set by migration step would not be corrected. Finally caused page fault happen again and again on ARM64 platform. Problem may be migration and mprotect hold different rwsem, and can't prevent this race condition.
 
I have checked with latest linux stable code, and found migration and mprotect operation almost same.
Please help to check this problem or help me to find if something is wrong with my conclude.
 

<<attachment: 60fead12.diff.zip>>

Attachment: EVT35-813-5163-boot4-0518_.log
Description: Binary data


[Index of Archives]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Bugtraq]     [Linux OMAP]     [Linux MIPS]     [eCos]     [Asterisk Internet PBX]     [Linux API]

  Powered by Linux