The patch titled Subject: mm/hugetlb: prevent hugetlb VMA to be misaligned has been added to the -mm tree. Its filename is mm-hugetlb-prevent-hugetlb-vma-to-be-misaligned.patch This patch should soon appear at http://ozlabs.org/~akpm/mmots/broken-out/mm-hugetlb-prevent-hugetlb-vma-to-be-misaligned.patch and later at http://ozlabs.org/~akpm/mmotm/broken-out/mm-hugetlb-prevent-hugetlb-vma-to-be-misaligned.patch Before you just go and hit "reply", please: a) Consider who else should be cc'ed b) Prefer to cc a suitable mailing list as well c) Ideally: find the original patch on the mailing list and do a reply-to-all to that, adding suitable additional cc's *** Remember to use Documentation/SubmitChecklist when testing your code *** The -mm tree is included into linux-next and is updated there every 3-4 working days ------------------------------------------------------ From: Laurent Dufour <ldufour@xxxxxxxxxxxxxxxxxx> Subject: mm/hugetlb: prevent hugetlb VMA to be misaligned When running the sampler detailed below, the kernel, if built with the VM debug option turned on (as many distro do), is panicing with the following message: kernel BUG at /build/linux-jWa1Fv/linux-4.15.0/mm/hugetlb.c:3310! Oops: Exception in kernel mode, sig: 5 [#1] LE SMP NR_CPUS=2048 NUMA PowerNV Modules linked in: kcm nfc af_alg caif_socket caif phonet fcrypt 8<--8<--8<--8< snip 8<--8<--8<--8< CPU: 18 PID: 43243 Comm: trinity-subchil Tainted: G C E 4.15.0-10-generic #11-Ubuntu NIP: c00000000036e764 LR: c00000000036ee48 CTR: 0000000000000009 REGS: c000003fbcdcf810 TRAP: 0700 Tainted: G C E (4.15.0-10-generic) MSR: 9000000000029033 <SF,HV,EE,ME,IR,DR,RI,LE> CR: 24002222 XER: 20040000 CFAR: c00000000036ee44 SOFTE: 1 GPR00: c00000000036ee48 c000003fbcdcfa90 c0000000016ea600 c000003fbcdcfc40 GPR04: c000003fd9858950 00007115e4e00000 00007115e4e10000 0000000000000000 GPR08: 0000000000000010 0000000000010000 0000000000000000 0000000000000000 GPR12: 0000000000002000 c000000007a2c600 00000fe3985954d0 00007115e4e00000 GPR16: 0000000000000000 0000000000000000 0000000000000000 0000000000000000 GPR20: 00000fe398595a94 000000000000a6fc c000003fd9858950 0000000000018554 GPR24: c000003fdcd84500 c0000000019acd00 00007115e4e10000 c000003fbcdcfc40 GPR28: 0000000000200000 00007115e4e00000 c000003fbc9ac600 c000003fd9858950 NIP [c00000000036e764] __unmap_hugepage_range+0xa4/0x760 LR [c00000000036ee48] __unmap_hugepage_range_final+0x28/0x50 Call Trace: [c000003fbcdcfa90] [00007115e4e00000] 0x7115e4e00000 (unreliable) [c000003fbcdcfb50] [c00000000036ee48] __unmap_hugepage_range_final+0x28/0x50 [c000003fbcdcfb80] [c00000000033497c] unmap_single_vma+0x11c/0x190 [c000003fbcdcfbd0] [c000000000334e14] unmap_vmas+0x94/0x140 [c000003fbcdcfc20] [c00000000034265c] exit_mmap+0x9c/0x1d0 [c000003fbcdcfce0] [c000000000105448] mmput+0xa8/0x1d0 [c000003fbcdcfd10] [c00000000010fad0] do_exit+0x360/0xc80 [c000003fbcdcfdd0] [c0000000001104c0] do_group_exit+0x60/0x100 [c000003fbcdcfe10] [c000000000110584] SyS_exit_group+0x24/0x30 [c000003fbcdcfe30] [c00000000000b184] system_call+0x58/0x6c Instruction dump: 552907fe e94a0028 e94a0408 eb2a0018 81590008 7f9c5036 0b090000 e9390010 7d2948f8 7d2a2838 0b0a0000 7d293038 <0b090000> e9230086 2fa90000 419e0468 ===[ end trace ee88f958a1c62605 ]=== The panic is due to a VMA pointing to a hugetlb area while the vma->vm_start or vma->vm_end field are not aligned to the huge page boundaries. The sampler is just unmapping a part of the hugetlb area, leading to 2 VMAs which are not well aligned. The same could be achieved by calling madvise() situation, as it is when running: stress-ng --shm-sysv 1 The hugetlb code is assuming that the VMA will be well aligned when it is unmapped, so we must prevent such a VMA from bing split or shrunk to a misaligned address. This patch prevents this by checking the new VMA's boundaries when a VMA is modified by calling vma_adjust(). === Sampler used to hit the panic nclude <sys/ipc.h> unsigned long page_size; int main(void) { int shmid, ret=1; void *addr; setbuf(stdout, NULL); page_size = getpagesize(); shmid = shmget(0x1410, LENGTH, IPC_CREAT | SHM_HUGETLB | SHM_R | SHM_W); if (shmid < 0) { perror("shmget"); exit(1); } printf("shmid: %d ", shmid); addr = shmat(shmid, NULL, 0); if (addr == (void*)-1) { perror("shmat"); goto out; } /* * The following munmap() call will split the VMA in 2, leading to * unaligned to huge page size VMAs which will trigger a check when * shmdt() is called. */ if (munmap(addr + HPSIZE + page_size, page_size)) { perror("munmap"); goto out; } if (shmdt(addr)) { perror("shmdt"); goto out; } printf("test done. "); ret = 0; out: shmctl(shmid, IPC_RMID, NULL); return ret; } === End of code Link: http://lkml.kernel.org/r/1521566754-30390-1-git-send-email-ldufour@xxxxxxxxxxxxxxxxxx Signed-off-by: Laurent Dufour <ldufour@xxxxxxxxxxxxxxxxxx> Cc: Michal Hocko <mhocko@xxxxxxxxxx> Cc: Andrea Arcangeli <aarcange@xxxxxxxxxx> Cc: "Kirill A. Shutemov" <kirill.shutemov@xxxxxxxxxxxxxxx> Cc: <stable@xxxxxxxxxxxxxxx> Signed-off-by: Andrew Morton <akpm@xxxxxxxxxxxxxxxxxxxx> --- diff -puN mm/mmap.c~mm-hugetlb-prevent-hugetlb-vma-to-be-misaligned mm/mmap.c --- a/mm/mmap.c~mm-hugetlb-prevent-hugetlb-vma-to-be-misaligned +++ a/mm/mmap.c @@ -692,6 +692,17 @@ int __vma_adjust(struct vm_area_struct * long adjust_next = 0; int remove_next = 0; + if (is_vm_hugetlb_page(vma)) { + /* + * We must check against the huge page boundarie to not + * create misaligned VMA. + */ + struct hstate *h = hstate_vma(vma); + + if (start & ~huge_page_mask(h) || end & ~huge_page_mask(h)) + return -EINVAL; + } + if (next && !insert) { struct vm_area_struct *exporter = NULL, *importer = NULL; _ Patches currently in -mm which might be from ldufour@xxxxxxxxxxxxxxxxxx are mm-hugetlb-prevent-hugetlb-vma-to-be-misaligned.patch -- To unsubscribe from this list: send the line "unsubscribe mm-commits" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html