On 01/12/2012 12:38 AM, Mel Gorman wrote: > On Wed, Jan 11, 2012 at 11:37:56PM +0530, Srivatsa S. Bhat wrote: >> Hi, >> I was running the latest kernel and not doing anything in particular. >> Eventually the machine locked up hard and due to my config setting >> (panic on hard-lockup), I got a kernel panic. >> >> Looks like there are several issues involved. >> > > Not sure why you are sending this directly to me but anyway; No particular reason. I was just Cc'ing mm developers and you just happened to come first on my list :-) > > When you say "not doing anything in particular", what do you mean? Does > this happen early in boot or just when running even light loads? > This happened only once and at that time, I was not running any jobs at all. The system was idle. I was working on some other system and when I got back to this one, I saw that it was completely hung and then I observed the hard-lockup and kernel panic on the console. > By latest kernel, your log says 3.2.0-0.0.0.28.36b5ec9-default. The > 3.2.0 is clear enough. What is 0.0.0.28.36b5ec9? It does not look like a > mainline git commit so have you applied some other patches or tree on > top? > This is the latest mainline tree as of yesterday when I tested it (git commit e343a895a) and this is after 3.2. (Ignore what the log says please). There were 2 quite unrelated patches I had applied on top of this: - a patch related to bnx2 (broadcom) to get my network working. - the MCE related rcu splat fix patch posted in https://lkml.org/lkml/2012/1/11/177 > If there are other patches applied, can you try vanilla 3.2? If that > fails, did 3.1 work? If yes, can you you bisect it? If you do not have > time for a full bisect, it might help to begin the bisect near commit > [02125a8: fix apparmor dereferencing potentially freed dentry, sanitize > __d_path() API]. Alternatively testing with apparmor=0 might be useful. > I had not hit this problem with 3.2-rc7 (the last kernel I ran before running this one). Commit 02125a8 seems to be from 3.2-rc5. > The first bug triggered in mm/slab.c and everything after that looks > like fallout from the first BUG_ON so that is worth figuring out first. > >> Here is the log: >> >> [ 7314.423828] ------------[ cut here ]------------ >> [ 7314.427769] kernel BUG at mm/slab.c:3111! >> [ 7314.427769] invalid opcode: 0000 [#1] SMP > > This in itself is suspicious. On kernel 3.2, this does not correspond > to a BUG_ON (the closest BUG_ON is in line 3109). In the latest git, > there is a BUG_ON on 3111 but that does not match your commit. Test > again with vanilla 3.2. > > As I said, my kernel _is_ the latest git. Please ignore what the log says. Thank you very much for your inputs, I will see if this problem occurs on vanilla 3.2 as well. > >> [ 7314.427769] CPU 3 >> [ 7314.427769] Modules linked in: ipv6 cpufreq_conservative cpufreq_userspace cpufreq_powersave acpi_cpufreq mperf microcode fuse loop dm_mod bnx2 ioatdma tpm_tis tpm cdc_ether usbnet i2c_i801 iTCO_wdt mii i7core_edac i2c_core dca edac_core iTCO_vendor_support rtc_cmos tpm_bios shpchp pci_hotplug button pcspkr serio_raw sg uhci_hcd ehci_hcd usbcore usb_common sd_mod crc_t10dif edd ext3 mbcache jbd fan processor mptsas mptscsih mptbase scsi_transport_sas scsi_mod thermal thermal_sys hwmon >> [ 7314.427769] >> [ 7314.427769] Pid: 6699, comm: cron Tainted: G W 3.2.0-0.0.0.28.36b5ec9-default #3 IBM IBM System x -[7870C4Q]-/68Y8033 >> [ 7314.427769] RIP: 0010:[<ffffffff8115bcf9>] [<ffffffff8115bcf9>] cache_alloc_refill+0x1e9/0x290 >> [ 7314.427769] RSP: 0018:ffff8808c881bc48 EFLAGS: 00010046 >> [ 7314.427769] RAX: 000000000000000f RBX: ffff8808ca66b000 RCX: 0000000000000018 >> [ 7314.427769] RDX: ffff8808c7e2d040 RSI: ffff8808c8f60040 RDI: 0000000000000024 >> [ 7314.427769] RBP: ffff8808c881bc88 R08: ffff8808ff802510 R09: ffff8808ff802520 >> [ 7314.427769] R10: dead000000200200 R11: dead000000100100 R12: 0000000000000024 >> [ 7314.427769] R13: ffff8808ff800880 R14: ffff8808ff802500 R15: 0000000000000000 >> [ 7314.427769] FS: 00007fdcd8f54780(0000) GS:ffff8808ffcc0000(0000) knlGS:0000000000000000 >> [ 7314.427769] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 >> [ 7314.427769] CR2: ffffffffff600400 CR3: 00000008c6e95000 CR4: 00000000000006e0 >> [ 7314.427769] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 >> [ 7314.427769] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 >> [ 7314.427769] Process cron (pid: 6699, threadinfo ffff8808c881a000, task ffff8808c68a0380) >> [ 7314.427769] Stack: >> [ 7314.427769] ffffffff81785cf1 00000000000412d0 ffff8808ff802540 ffff8808ff800880 >> [ 7314.427769] ffff8808ff800880 0000000000000100 00000000000000d0 00000000000000d0 >> [ 7314.427769] ffff8808c881bcd8 ffffffff8115c7e7 ffff8808c881bd26 ffffffff81230418 >> [ 7314.427769] Call Trace: >> [ 7314.427769] [<ffffffff8115c7e7>] __kmalloc+0x327/0x330 >> [ 7314.427769] [<ffffffff81230418>] ? aa_get_name+0x58/0x100 >> [ 7314.427769] [<ffffffff81230418>] aa_get_name+0x58/0x100 >> [ 7314.427769] [<ffffffff8120c229>] ? cap_bprm_set_creds+0x239/0x2a0 >> [ 7314.427769] [<ffffffff81230d92>] apparmor_bprm_set_creds+0x112/0x580 >> [ 7314.427769] [<ffffffff8109b44e>] ? __lock_release+0x7e/0x170 >> [ 7314.427769] [<ffffffff81131e2e>] ? might_fault+0x4e/0xa0 >> [ 7314.427769] [<ffffffff8120cbae>] security_bprm_set_creds+0xe/0x10 >> [ 7314.427769] [<ffffffff8117b48a>] prepare_binprm+0xca/0x140 >> [ 7314.427769] [<ffffffff8117d624>] do_execve_common+0x204/0x320 >> [ 7314.427769] [<ffffffff8117d7ca>] do_execve+0x3a/0x40 >> [ 7314.427769] [<ffffffff8100b079>] sys_execve+0x49/0x70 >> [ 7314.427769] [<ffffffff8149c0fc>] stub_execve+0x6c/0xc0 >> [ 7314.427769] Code: 08 49 89 76 10 eb a6 0f 1f 00 49 8b 76 20 41 c7 86 90 00 00 00 01 00 00 00 49 39 f1 74 97 8b 46 20 41 3b 45 18 0f 82 02 ff ff ff <0f> 0b eb fe 0f 1f 00 41 39 c4 41 89 c7 45 0f 46 fc e9 ab fe ff >> [ 7314.427769] RIP [<ffffffff8115bcf9>] cache_alloc_refill+0x1e9/0x290 >> [ 7314.427769] RSP <ffff8808c881bc48> > > This does not look familiar but I am not up to date on linux-mm. Pekka, > does this ring a bell? > Regards, Srivatsa S. Bhat IBM Linux Technology Center -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@xxxxxxxxx. For more info on Linux MM, see: http://www.linux-mm.org/ . Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/ Don't email: <a href=mailto:"dont@xxxxxxxxx"> email@xxxxxxxxx </a>