On 17 Jul 17:33, Naoya Horiguchi wrote: > In my environment (kernel-3.14.12, libhugetlbfs-utils-2.16-2.fc20.x86_64), > the crash looks like this: > > [root@test_140717-1333 hugetlbfs_test]# $ export HUGETLB_MORECORE=yes ; export HUGETLB_NO_PREFAULT= ; hugectl --heap ./heap > bash: $: command not found... > p 0x2200010 > pid 2809 > *** Error in `./heap': break adjusted to free malloc space: 0x0000000002501000 *** > ======= Backtrace: ========= > /lib64/libc.so.6[0x3940e75cff] > /lib64/libc.so.6[0x3940e7f121] > /lib64/libc.so.6(__libc_malloc+0x5c)[0x3940e7ff6c] > ./heap[0x400767] > /lib64/libc.so.6(__libc_start_main+0xf5)[0x3940e21d65] > ./heap[0x400619] > ======= Memory map: ======== > 00400000-00401000 r-xp 00000000 fd:01 272411 /root/hugetlbfs_test/heap > 00600000-00601000 r--p 00000000 fd:01 272411 /root/hugetlbfs_test/heap > 00601000-00602000 rw-p 00001000 fd:01 272411 /root/hugetlbfs_test/heap > 02200000-02600000 rw-p 00000000 00:0c 23209 /anon_hugepage (deleted) > 02600000-02800000 rw-p 00400000 00:0c 25663 /anon_hugepage (deleted) > 3940a00000-3940a20000 r-xp 00000000 fd:01 175094 /usr/lib64/ld-2.18.so > 3940c1f000-3940c20000 r--p 0001f000 fd:01 175094 /usr/lib64/ld-2.18.so > 3940c20000-3940c21000 rw-p 00020000 fd:01 175094 /usr/lib64/ld-2.18.so > 3940c21000-3940c22000 rw-p 00000000 00:00 0 > 3940e00000-3940fb4000 r-xp 00000000 fd:01 175095 /usr/lib64/libc-2.18.so > 3940fb4000-39411b4000 ---p 001b4000 fd:01 175095 /usr/lib64/libc-2.18.so > 39411b4000-39411b8000 r--p 001b4000 fd:01 175095 /usr/lib64/libc-2.18.so > 39411b8000-39411ba000 rw-p 001b8000 fd:01 175095 /usr/lib64/libc-2.18.so > 39411ba000-39411bf000 rw-p 00000000 00:00 0 > 3941200000-3941203000 r-xp 00000000 fd:01 175098 /usr/lib64/libdl-2.18.so > 3941203000-3941402000 ---p 00003000 fd:01 175098 /usr/lib64/libdl-2.18.so > 3941402000-3941403000 r--p 00002000 fd:01 175098 /usr/lib64/libdl-2.18.so > 3941403000-3941404000 rw-p 00003000 fd:01 175098 /usr/lib64/libdl-2.18.so > 7f3860277000-7f386028c000 r-xp 00000000 fd:01 184953 /usr/lib64/libgcc_s-4.8.3-20140624.so.1 > 7f386028c000-7f386048b000 ---p 00015000 fd:01 184953 /usr/lib64/libgcc_s-4.8.3-20140624.so.1 > 7f386048b000-7f386048c000 r--p 00014000 fd:01 184953 /usr/lib64/libgcc_s-4.8.3-20140624.so.1 > 7f386048c000-7f386048d000 rw-p 00015000 fd:01 184953 /usr/lib64/libgcc_s-4.8.3-20140624.so.1 > 7f38604a2000-7f38604a6000 rw-p 00000000 00:00 0 > 7f38604a6000-7f38604b6000 r-xp 00000000 fd:01 177014 /usr/lib64/libhugetlbfs.so > 7f38604b6000-7f38606b5000 ---p 00010000 fd:01 177014 /usr/lib64/libhugetlbfs.so > 7f38606b5000-7f38606b6000 r--p 0000f000 fd:01 177014 /usr/lib64/libhugetlbfs.so > 7f38606b6000-7f38606b7000 rw-p 00010000 fd:01 177014 /usr/lib64/libhugetlbfs.so > 7f38606b7000-7f38606c2000 rw-p 00000000 00:00 0 > 7f38606d5000-7f38606d6000 rw-p 00000000 00:00 0 > 7f38606d6000-7f38606d8000 rw-p 00000000 00:00 0 > 7fff07c44000-7fff07c65000 rw-p 00000000 00:00 0 [stack] > 7fff07d52000-7fff07d54000 r-xp 00000000 00:00 0 [vdso] > ffffffffff600000-ffffffffff601000 r-xp 00000000 00:00 0 [vsyscall] > > Is this crash the same as yours? It's very similar in the sense that's an assert in malloc. My error is slightly different but I guess the exact error depends on a few runtime parameters > And it seems that this also happens on v3.16-rc5. > So it might be an upstream bug, not a stable-specific matter. That's my understanding as well. I just reported it for 3.4 and 3.14 since these were the kernels I could easily try my original test with. > It looks strange to me that the problem is gone by removing the commit > 4a705fef98 (although I confirmed it is,) because the kernel's behavior > shouldn't change unless (is_hugetlb_entry_migration(entry) || > is_hugetlb_entry_hwpoisoned(entry)) is true. And I checked with systemtap > that both these check returned false in the above test program. > So I'm wondering why the commit makes difference for this test program. I don't know why I am just thinking about that now. Isn't this the fact that your patch moved the huge_ptep_get() before huge_ptep_set_wrprotect() in the pte_present() cow case? Actually, I've just tried to re-add the huge_ptep_get call for that case and it's fixing the problem for me... Hmm, want a patch? -- Guillaume Morin <guillaume@xxxxxxxxxxx> -- To unsubscribe from this list: send the line "unsubscribe stable" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html