On 09/23, Sasha Levin wrote: > > Another similar trace where we see a problem during process exit: > > [1922964.887922] kasan: GPF could be caused by NULL-ptr deref or user memory accessgeneral protection fault: 0000 [#1] PREEMPT SMP DEBUG_PAGEALLOC KASAN > [1922964.890234] Modules linked in: > [1922964.890844] CPU: 1 PID: 21477 Comm: trinity-c161 Tainted: G W 4.3.0-rc2-next-20150923-sasha-00079-gec04207-dirty #2569 > [1922964.892584] task: ffff880251858000 ti: ffff88009f258000 task.ti: ffff88009f258000 > [1922964.893723] RIP: acct_collect (kernel/acct.c:542) 530 void acct_collect(long exitcode, int group_dead) 531 { 532 struct pacct_struct *pacct = ¤t->signal->pacct; 533 cputime_t utime, stime; 534 unsigned long vsize = 0; 535 536 if (group_dead && current->mm) { 537 struct vm_area_struct *vma; 538 539 down_read(¤t->mm->mmap_sem); 540 vma = current->mm->mmap; 541 while (vma) { 542 vsize += vma->vm_end - vma->vm_start; // !!!!!!!!!!!! 543 vma = vma->vm_next; 544 } 545 up_read(¤t->mm->mmap_sem); 546 } > [1922964.895105] RSP: 0000:ffff88009f25f908 EFLAGS: 00010207 > [1922964.895935] RAX: dffffc0000000000 RBX: 2ce0ffffffffffff RCX: 0000000000000000 > [1922964.897008] RDX: ffff2152b153ffff RSI: 059c200000000000 RDI: 2ce1000000000007 > [1922964.898091] RBP: ffff88009f25f9e8 R08: 0000000000000001 R09: 00000000000003ef > [1922964.899178] R10: ffffed014d7a3a01 R11: 0000000000000001 R12: ffff880082b485c0 > [1922964.901643] R13: ffff2152b153ffff R14: 1ffff10013e4bf24 R15: ffff88009f25f9c0 ... > 0: 02 00 add (%rax),%al > 2: 0f 85 9d 05 00 00 jne 0x5a5 > 8: 48 8b 1b mov (%rbx),%rbx > b: 48 85 db test %rbx,%rbx > e: 0f 84 7b 05 00 00 je 0x58f Probably "mov (%rbx),%rbx" is "vma = mm->mmap", > 14: 48 b8 00 00 00 00 00 movabs $0xdffffc0000000000,%rax > 1b: fc ff df > 1e: 31 d2 xor %edx,%edx > 20: 48 8d 7b 08 lea 0x8(%rbx),%rdi and this loads the addr of vma->vm_end for kasan, > 24: 48 89 fe mov %rdi,%rsi > 27: 48 c1 ee 03 shr $0x3,%rsi > 2b:* 80 3c 06 00 cmpb $0x0,(%rsi,%rax,1) <-- trapping instruction which reporst the error. But in this case this is not NULL-deref, note that $rbx = 2ce0ffffffffffff and this is below __PAGE_OFFSET (but above TASK_SIZE_MAX). It seems it is not even canonical. In any case this odd value can't be valid. Again, looks like mm->mmap pointer was corrupted. Perhaps you can re-test with the stupid patch below. But unlikely it will help. If mm was freed we would probably see something else. Oleg. --- --- a/kernel/fork.c +++ b/kernel/fork.c @@ -672,6 +672,7 @@ struct mm_struct *mm_alloc(void) void __mmdrop(struct mm_struct *mm) { BUG_ON(mm == &init_mm); + BUG_ON(atomic_read(&mm->mm_users)); mm_free_pgd(mm); destroy_context(mm); mmu_notifier_mm_destroy(mm); -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@xxxxxxxxx. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@xxxxxxxxx"> email@xxxxxxxxx </a>