On Thu, Jan 11, 2018 at 6:11 AM, Tetsuo Handa <penguin-kernel@xxxxxxxxxxxxxxxxxxx> wrote: > > I retested with some debug printk() patch. Could you perhaps enable KASAN too? > [ 38.988178] Out of memory: Kill process 354 (b.out) score 7 or sacrifice child > [ 38.991145] Killed process 354 (b.out) total-vm:2099260kB, anon-rss:23288kB, file-rss:8kB, shmem-rss:0kB > [ 38.996277] oom_reaper: started reaping > [ 38.999033] BUG: unable to handle kernel paging request at c130d86d > [ 39.001802] IP: _raw_spin_lock_irqsave+0x1c/0x40 The "Code:" line shows the whole function in this case: 0: 55 push %ebp 1: 89 c1 mov %eax,%ecx 3: 89 e5 mov %esp,%ebp 5: 56 push %esi 6: 53 push %ebx 7: 9c pushf 8: 58 pop %eax 9: 66 66 66 90 nop d: 89 c6 mov %eax,%esi f: fa cli 10: 66 66 90 nop 13: 66 90 nop 15: 31 c0 xor %eax,%eax 17: bb 01 00 00 00 mov $0x1,%ebx 1c: 3e 0f b1 19 cmpxchg %ebx,%ds:*(%ecx) <-- trapping instruction 20: 85 c0 test %eax,%eax 22: 75 06 jne 0x2a 24: 89 f0 mov %esi,%eax 26: 5b pop %ebx 27: 5e pop %esi 28: 5d pop %ebp 29: c3 ret although it isn't all that interesting since it's just "_raw_spin_lock_irqsave". The odd "nop" instructions are because of paravirtualization support leaving room for rewriting the eflags operations. Anyway, %ecx is garbage - it *should* be "&memcg->move_lock", apparently. The caller does: again: memcg = page->mem_cgroup; if (unlikely(!memcg)) return NULL; if (atomic_read(&memcg->moving_account) <= 0) return memcg; spin_lock_irqsave(&memcg->move_lock, flags); if (memcg != page->mem_cgroup) { spin_unlock_irqrestore(&memcg->move_lock, flags); goto again; } What's a bit odd is how the access to "memcg->move_lock" seems to trap, but we did that atomic_read() from memcg->moving_account ok. The reason seems to be that this is actually a valid kernel pointer, but it's read-protected: > [ 39.004069] *pde = 01f88063 *pte = 0130d161 > [ 39.006250] Oops: 0003 [#1] SMP DEBUG_PAGEALLOC That "0003" means that it was a protection fault on a write. The "*pte" thing agrees. It's the normal 1:1 mapping of the physical page 0130d000 (which matches the virtual address c130d86d), but it's presumably a kernel code pointer and this RO. So presumably "page->mem_cgroup" was just a random pointer. Which probably means that "page" itself is not actually a page pointer, sinc eI assume there was no memory hotplug going on here? > [ 39.022885] EIP: _raw_spin_lock_irqsave+0x1c/0x40 > [ 39.037889] Call Trace: > [ 39.043562] lock_page_memcg+0x25/0x80 > [ 39.045421] page_remove_rmap+0x87/0x2e0 > [ 39.047315] try_to_unmap_one+0x20e/0x590 > [ 39.049198] rmap_walk_file+0x13c/0x250 > [ 39.051012] rmap_walk+0x32/0x60 > [ 39.052619] try_to_unmap+0x4d/0x100 > [ 39.059849] shrink_page_list+0x3a2/0x1000 > [ 39.061678] shrink_inactive_list+0x1b2/0x440 > [ 39.063539] shrink_node_memcg+0x34a/0x770 > [ 39.065297] shrink_node+0xbb/0x2e0 > [ 39.066920] do_try_to_free_pages+0xba/0x320 > [ 39.068752] try_to_free_pages+0x11d/0x330 > [ 39.072084] __alloc_pages_slowpath+0x303/0x6d9 > [ 39.075932] __alloc_pages_nodemask+0x16d/0x180 > [ 39.077809] do_anonymous_page+0xab/0x4f0 > [ 39.079551] handle_mm_fault+0x531/0x8d0 > [ 39.084422] __do_page_fault+0x1ea/0x4d0 > [ 39.087666] do_page_fault+0x1a/0x20 > [ 39.089184] common_exception+0x6f/0x76 Looks like the page pointer came from shrink_inactive_list() doing isolate_lru_pages(). Scary. It all seems to just mean that the page LRU queues are corrupted. Most (all?) of your other oopses seem to have somewhat similar patterns: shrink_inactive_list() -> rmap_walk_file() -> oops due to garbage. > Overall, memory corruption is strongly suspected. Yeah, this very much looks like some internal VM memory corruption. Which is why I'm wondering if enabling KASAN might help find the actual access that causes the corruption. Or at least an _earlier_ access that is closer to it than these that all seem to be fairly far removed from where it actually all started.. Linus