> -----Original Message----- > From: Michal Hocko [mailto:mstsxfx@xxxxxxxxx] On Behalf Of Michal Hocko > Sent: Thursday, September 11, 2014 5:29 AM > To: Dexuan Cui > Cc: linux-mm@xxxxxxxxx; linux-kernel@xxxxxxxxxxxxxxx; Johannes Weiner; > Hugh Dickins; KY Srinivasan > Subject: Re: page fault in mem_cgroup_page_lruvec() due to memory hot- > add > > On Thu 11-09-14 12:07:13, Dexuan Cui wrote: > > Hi all, > > > > When I try to run Ubuntu 14.10 guest (the nightly build with the > > kernel version 3.16.0-12-generic) on hyper-v, occasionally, I get the > > below panic(see the end of the mail) suddenly. > > (I suppose it's likely the upstream kernel has the issue too) > > > > When the panic happens, I'm running a memory stress program to test > > the balloon driver drivers/hv/hv_balloon.c, which can hot-add memory > > to the guest by invoking memory_add_physaddr_to_nid() and > > add_memory(), if the feature "Dynamic Memory" is enabled. > > > > The issue here is: the memory hot-add seems successful, but > > occasionally the page fault can happen and crash the whole guest. > > > > It looks the crash only happens to the guest in the SMP guest case. I > > never get the crash when the guest is configured with 1 vCPU. > > > > Sometimes it's very difficult to reproduce the crash while sometimes > > it's relatively easy. > > > > Can anybody please shed some light? > > > > Thanks! > > > > -- Dexuan > > > > [ 99.211382] BUG: unable to handle kernel paging request at > 0000000000c0b608 > > [ 99.215308] IP: [<ffffffff811d2e9c>] > mem_cgroup_page_lruvec+0x2c/0xa0 > > [ 99.215308] PGD 37544067 PUD 393c2067 PMD 0 > > [ 99.215308] Oops: 0000 [#1] SMP > > [ 99.215308] Modules linked in: bnep rfcomm bluetooth 6lowpan_iphc > joydev hid_generic crct10dif_pclmul crc32_pclmul ghash_clmulni_intel > aesni_intel aes_x86_64 lrw gf128mul glue_helper ablk_helper cryptd > hyperv_keyboard hv_balloon hid_hyperv hid serio_raw i2c_piix4 mac_hid > parport_pc ppdev lp parport hv_netvsc hv_utils hv_storvsc psmouse > hv_vmbus pata_acpi floppy > > [ 99.215308] CPU: 3 PID: 1919 Comm: stressapptest Not tainted 3.16.0-12- > generic #18-Ubuntu > > [ 99.215308] Hardware name: Microsoft Corporation Virtual > Machine/Virtual Machine, BIOS 090006 05/23/2012 > > [ 99.215308] task: ffff880034282880 ti: ffff8800415f8000 task.ti: > ffff8800415f8000 > > [ 99.215308] RIP: 0010:[<ffffffff811d2e9c>] [<ffffffff811d2e9c>] > mem_cgroup_page_lruvec+0x2c/0xa0 > > [ 99.215308] RSP: 0000:ffff8800415fbc58 EFLAGS: 00010006 > > [ 99.215308] RAX: 0000000000c0b600 RBX: ffff88003ffebf80 RCX: > ffff88003ffea300 > > [ 99.215308] RDX: 02ffff00000d0001 RSI: ffff88003ffebf80 RDI: > ffffea000302d800 > > [ 99.215308] RBP: ffff8800415fbc68 R08: 0000000000000008 R09: > 0000000000000004 > > [ 99.215308] R10: 00000000ffffffff R11: ffff880033ffd400 R12: > ffffea000302d800 > > [ 99.215308] R13: ffffea000302d800 R14: ffff88003b4703c0 R15: > 0000000000000202 > > [ 99.215308] FS: 00007fb0075fa700(0000) GS:ffff88003b460000(0000) > knlGS:0000000000000000 > > [ 99.215308] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b > > [ 99.215308] CR2: 0000000000c0b608 CR3: 00000000389b4000 CR4: > 00000000000006e0 > > [ 99.215308] Stack: > > [ 99.215308] ffff88003ffebf80 0000000000000002 ffff8800415fbcc0 > ffffffff81178604 > > [ 99.215308] 0000000000000000 ffffffff81177bf0 0000000000000296 > ffff8800415fbc90 > > [ 99.215308] 0000000000000003 000000000004b5d8 00000000000200da > ffff880036db6480 > > [ 99.215308] Call Trace: > > [ 99.215308] [<ffffffff81178604>] pagevec_lru_move_fn+0xc4/0x130 > > [ 99.215308] [<ffffffff81177bf0>] ? __activate_page+0x1e0/0x1e0 > > [ 99.215308] [<ffffffff81178b7e>] lru_add_drain_cpu+0xce/0xe0 > > [ 99.215308] [<ffffffff81178c96>] lru_add_drain+0x16/0x20 > > [ 99.215308] [<ffffffff811aa2e6>] swapin_readahead+0x126/0x1a0 > > [ 99.215308] [<ffffffff81198c17>] handle_mm_fault+0xc87/0xf90 > > [ 99.215308] [<ffffffff8105ce42>] __do_page_fault+0x1c2/0x580 > > [ 99.215308] [<ffffffff810a9d18>] ? __enqueue_entity+0x78/0x80 > > [ 99.215308] [<ffffffff810ae8c4>] ? update_curr+0xf4/0x180 > > [ 99.215308] [<ffffffff810ab3f8>] ? pick_next_entity+0x88/0x180 > > [ 99.215308] [<ffffffff810b3b8e>] ? pick_next_task_fair+0x57e/0x8d0 > > [ 99.215308] [<ffffffff810a84e8>] ? sched_clock_cpu+0x88/0xb0 > > [ 99.215308] [<ffffffff8105d231>] do_page_fault+0x31/0x70 > > [ 99.215308] [<ffffffff81782908>] page_fault+0x28/0x30 > > [ 99.215308] Code: 66 66 66 90 8b 15 15 2a b5 00 55 48 8d 86 48 05 00 00 48 > 89 e5 41 54 53 85 d2 48 89 f3 75 56 49 89 fc e8 18 44 00 00 49 8b 14 24 <48> 8b > 48 08 83 e2 20 75 1b 48 8b 10 83 e2 02 75 13 48 8b 15 54 > > [ 99.215308] RIP [<ffffffff811d2e9c>] > mem_cgroup_page_lruvec+0x2c/0xa0 > > [ 99.215308] RSP <ffff8800415fbc58> > > [ 99.215308] CR2: 0000000000c0b608 > > [ 99.215308] ---[ end trace 24db5f2378e898cb ]--- > > This decodes to: > All code > ======== > 0: 66 66 66 90 data16 data16 xchg %ax,%ax > 4: 8b 15 15 2a b5 00 mov 0xb52a15(%rip),%edx # 0xb52a1f > a: 55 push %rbp > b: 48 8d 86 48 05 00 00 lea 0x548(%rsi),%rax > 12: 48 89 e5 mov %rsp,%rbp > 15: 41 54 push %r12 > 17: 53 push %rbx > 18: 85 d2 test %edx,%edx > 1a: 48 89 f3 mov %rsi,%rbx > 1d: 75 56 jne 0x75 > 1f: 49 89 fc mov %rdi,%r12 > 22: e8 18 44 00 00 callq 0x443f > 27: 49 8b 14 24 mov (%r12),%rdx > 2b:* 48 8b 48 08 mov 0x8(%rax),%rcx <-- trapping instruction > 2f: 83 e2 20 and $0x20,%edx > 32: 75 1b jne 0x4f > 34: 48 8b 10 mov (%rax),%rdx > 37: 83 e2 02 and $0x2,%edx > 3a: 75 13 jne 0x4f > 3c: 48 rex.W > 3d: 8b .byte 0x8b > 3e: 15 .byte 0x15 > 3f: 54 push %rsp > > and that matches to the following code: > pc = lookup_page_cgroup(page); > memcg = pc->mem_cgroup; <<< BANG > > So the lookup_page_cgroup returned a garbage (rax is supposed to be pc > pointer but the value is definitely not a kernel pointer. It looks like an offset > from zero base address). The page itself (rdi resp. r12 looks pretty normal to > me). I would strongly suspect that the HyperV is doing something nasty > when offlining the memory. Because there shouldn't be any page left behind > when the node_data resp. mem_section (depending on the used memory > model) is torn down. > > KY, any ideas? I will take a look. K. Y > -- > Michal Hocko > SUSE Labs -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@xxxxxxxxx. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href