On Wed, Jul 30, 2008 at 6:10 PM, Andrew Morton <akpm@xxxxxxxxxxxxxxxxxxxx> wrote: > On Wed, 30 Jul 2008 18:02:03 +0800 "Dave Young" <hidave.darkstar@xxxxxxxxx> wrote: > >> On Wed, Jul 30, 2008 at 5:08 PM, Andrew Morton >> <akpm@xxxxxxxxxxxxxxxxxxxx> wrote: >> > On Tue, 29 Jul 2008 14:32:04 +0200 Johannes Berg <johannes@xxxxxxxxxxxxxxxx> wrote: >> > >> >> On Tue, 2008-07-29 at 13:57 +0800, Dave Young wrote: >> >> > While set my wlan0 to ad-hoc mode and then set the essid to something, >> >> > this bug triggered. >> >> > >> >> > I suspect it's related to debugfs or mutex? But I'm not sure. >> >> > >> >> > Jul 29 13:01:46 darkstar kernel: [ 8033.736493] BUG: soft lockup - CPU#0 stuck for 61s! [events/0:9] >> >> > Jul 29 13:01:46 darkstar kernel: [ 8033.736493] >> >> > Jul 29 13:01:46 darkstar kernel: [ 8033.736493] Pid: 9, comm: events/0 Tainted: G W (2.6.26-smp #3) >> >> > Jul 29 13:01:46 darkstar kernel: [ 8033.736493] EIP: 0060:[<c0450bc3>] EFLAGS: 00000246 CPU: 0 >> >> > Jul 29 13:01:46 darkstar kernel: [ 8033.736493] EIP is at mutex_lock_nested+0x1b3/0x280 >> >> > Jul 29 13:01:46 darkstar kernel: [ 8033.736493] EAX: 00000000 EBX: f7bf5284 ECX: f7c7e000 EDX: f7bf52a8 >> >> > Jul 29 13:01:46 darkstar kernel: [ 8033.736493] ESI: f7bf5288 EDI: 00000246 EBP: f7c7fedc ESP: f7c7fe9c >> >> > Jul 29 13:01:46 darkstar kernel: [ 8033.736493] DS: 007b ES: 007b FS: 00d8 GS: 0000 SS: 0068 >> >> > Jul 29 13:01:46 darkstar kernel: [ 8033.736493] CR0: 8005003b CR2: b800c5ac CR3: 36e71000 CR4: 000006d0 >> >> > Jul 29 13:01:46 darkstar kernel: [ 8033.736493] DR0: 00000000 DR1: 00000000 DR2: 00000000 DR3: 00000000 >> >> > Jul 29 13:01:46 darkstar kernel: [ 8033.736493] DR6: ffff0ff0 DR7: 00000400 >> >> > Jul 29 13:01:46 darkstar kernel: [ 8033.736493] [<c02375a6>] ? debugfs_create_file+0x46/0x210 >> >> >> >> It looks like it's getting stuck on >> >> mutex_lock(&parent->d_inode->i_mutex); >> >> >> >> inside debugfs_create_by_name, but I have no idea why that would happen. >> >> >> > >> > Dave, please try sysrq-W or sysrq-T during that 60 seconds, see if we >> > can find another task which is holding onto that lock. >> > >> > >> >> Andrew, the keyboard doesn't response after the bug triggered. >> >> So I just keep sysrq-W / sysrq-T every several seconds after I set the >> wlan0 interface. >> Attached please see the long log file. > > argh. Look: > > Jul 30 17:34:22 darkstar kernel: [ 145.111510] f703fbf8 00000082 00000002 f703fbe4 f703fbd8 00000000 f704ac40 f703fbcc > Jul 30 17:34:22 darkstar kernel: [ 145.111510] 00000046 c0669060 c066c580 c066c580 c066c580 f703fbe8 f704ad94 c1eaf580 > Jul 30 17:34:22 darkstar kernel: [ 145.111510] 00000001 f704ac40 00000000 f702ca00 c0452697 ffff55ba 00000000 c014b06d > Jul 30 17:34:22 darkstar kernel: [ 145.111510] Call Trace: > Jul 30 17:34:22 darkstar kernel: [ 145.111510] [<c0452697>] ? _spin_unlock_irqrestore+0x47/0x60 > Jul 30 17:34:22 darkstar kernel: [ 145.111510] [<c014b06d>] ? trace_hardirqs_on+0xbd/0x140 > Jul 30 17:34:22 darkstar kernel: [ 145.111510] [<c0450265>] schedule_timeout+0x75/0xc0 > Jul 30 17:34:22 darkstar kernel: [ 145.111510] [<c01942c0>] ? __pollwait+0x60/0xd0 > Jul 30 17:34:22 darkstar kernel: [ 145.111510] [<f888b02e>] ? evdev_poll+0x2e/0x60 [evdev] > Jul 30 17:34:22 darkstar kernel: [ 145.111510] [<c0193564>] do_sys_poll+0x264/0x3a0 > Jul 30 17:34:22 darkstar kernel: [ 145.111510] [<c0194260>] ? __pollwait+0x0/0xd0 > Jul 30 17:34:22 darkstar kernel: [ 145.111510] [<c0122160>] ? default_wake_function+0x0/0x10 > Jul 30 17:34:22 darkstar last message repeated 3 times > Jul 30 17:34:22 darkstar kernel: [ 145.111510] [<c01220c5>] ? try_to_wake_up+0x75/0x110 > Jul 30 17:34:22 darkstar kernel: [ 145.111510] [<c014aed0>] ? mark_held_locks+0x40/0x80 > Jul 30 17:34:22 darkstar kernel: [ 145.111510] [<c014b77b>] ? __lock_acquire+0x24b/0x1040 > Jul 30 17:34:22 darkstar kernel: [ 145.111510] [<c014b77b>] ? __lock_acquire+0x24b/0x1040 > Jul 30 17:34:22 darkstar kernel: [ 145.111510] [<c03da639>] ? sock_def_readable+0x59/0x80 > Jul 30 17:34:22 darkstar kernel: [ 145.111510] [<c014b77b>] ? __lock_acquire+0x24b/0x1040 > Jul 30 17:34:22 darkstar last message repeated 5 times > Jul 30 17:34:22 darkstar kernel: [ 145.111510] [<c0109d0f>] ? native_sched_clock+0x7f/0xb0 > Jul 30 17:34:22 darkstar kernel: [ 145.111510] [<c0243b4f>] ? security_file_permission+0xf/0x20 > Jul 30 17:34:22 darkstar kernel: [ 145.111510] [<c0186387>] ? rw_verify_area+0x57/0xc0 > Jul 30 17:34:22 darkstar kernel: [ 145.111510] [<c0451f5d>] ? _spin_unlock+0x1d/0x20 > Jul 30 17:34:22 darkstar kernel: [ 145.111510] [<c0186b94>] ? vfs_read+0x94/0x130 > Jul 30 17:34:22 darkstar kernel: [ 145.111510] [<f888be60>] ? evdev_read+0x0/0x1b0 [evdev] > Jul 30 17:34:22 darkstar kernel: [ 145.111510] [<c0103dbf>] ? restore_nocheck+0x12/0x15 > Jul 30 17:34:22 darkstar kernel: [ 145.111510] [<c01936cd>] sys_poll+0x2d/0x70 > Jul 30 17:34:22 darkstar kernel: [ 145.111510] [<c0103d5e>] syscall_call+0x7/0xb > Jul 30 17:34:22 darkstar kernel: [ 145.111510] ======================= > Jul 30 17:34:22 darkstar kernel: [ 145.111510] f63efd94 00000092 00000002 00000000 00000000 f5064028 f7049d80 00000001 > Jul 30 17:34:22 darkstar kernel: [ 145.111510] 00000000 c0669060 c066c580 c066c580 c066c580 00000000 f7049ed4 c1e5d580 > Jul 30 17:34:22 darkstar kernel: [ 145.111510] 00000000 00000000 00000135 f702e080 f7049d80 00000003 00000000 f7049d80 > Jul 30 17:34:22 darkstar kernel: [ 145.111510] Call Trace: > Jul 30 17:34:22 darkstar kernel: [ 145.111510] [<c0450265>] schedule_timeout+0x75/0xc0 > Jul 30 17:34:22 darkstar kernel: [ 145.111510] [<c0436b98>] unix_stream_recvmsg+0x2a8/0x4e0 > > > We're missing part of the sysrq traces. I noticed this a while back. > We need this info to find the D-state processes. > > It's _possible_ that the kernel is actually trying to print this info, > but it's doing it at a different prink facility level from the stack > output, so we're only seeing part of the printing. > > Please try changing your logging level (dmesg -n 0, or dmesg -n 7 - I > always forget which). Then retry the sysrq-T. If that successfully > causes _all_ the output to be presented, then please rerun the above > test. > > You might want to compress the file too - I don't know whether the vger > servers will accept a 730kB mail. > > Ok here it is. BTW, I run "klogd -c 7" after boot This time I get a kmalloc poison overwritten: Jul 31 09:51:11 darkstar kernel: [ 2024.960036] BUG kmalloc-4096: Poison overwritten Jul 31 09:51:11 darkstar kernel: [ 2024.960036] ----------------------------------------------------------------------------- Jul 31 09:51:11 darkstar kernel: [ 2024.960036] Jul 31 09:51:11 darkstar kernel: [ 2024.960036] INFO: 0xf6f3a080-0xf6f3a0ef. First byte 0x80 instead of 0x6b Jul 31 09:51:11 darkstar kernel: [ 2024.960036] INFO: Allocated in dev_alloc_skb+0x1c/0x30 age=3642 cpu=0 pid=0 Jul 31 09:51:11 darkstar kernel: [ 2024.960036] INFO: Freed in skb_release_data+0x57/0x80 age=3146 cpu=0 pid=2398 Jul 31 09:51:11 darkstar kernel: [ 2024.960036] INFO: Slab 0xc1c05440 objects=7 used=3 fp=0xf6f3a060 flags=0x400020c3 Jul 31 09:51:11 darkstar kernel: [ 2024.960036] INFO: Object 0xf6f3a060 @offset=8288 fp=0xf6f39030 Jul 31 09:51:11 darkstar kernel: [ 2024.960036] Jul 31 09:51:11 darkstar kernel: [ 2024.960036] Bytes b4 0xf6f3a050: 5e 09 00 00 57 c9 05 00 5a 5a 5a 5a 5a 5a 5a 5a ^...WÉ..ZZZZZZZZ Jul 31 09:51:11 darkstar kernel: [ 2024.960036] Object 0xf6f3a060: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b kkkkkkkkkkkkkkkk Jul 31 09:51:11 darkstar kernel: [ 2024.960036] Object 0xf6f3a070: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b kkkkkkkkkkkkkkkk Jul 31 09:51:11 darkstar kernel: [ 2024.960036] Object 0xf6f3a080: 80 00 00 00 ff ff ff ff ff ff 00 17 7b 00 46 40 ....ÿÿÿÿÿÿ..{.F@ Jul 31 09:51:11 darkstar kernel: [ 2024.960036] Object 0xf6f3a090: 00 17 7b 00 46 40 30 09 81 21 08 7a 21 00 00 00 ..{.F@xxx!.z!... Jul 31 09:51:11 darkstar kernel: [ 2024.960036] Object 0xf6f3a0a0: 64 00 21 04 00 07 00 00 00 00 00 00 00 01 08 82 d.!............. Jul 31 09:51:11 darkstar kernel: [ 2024.960036] Object 0xf6f3a0b0: 84 8b 0c 12 96 18 24 03 01 01 05 04 00 02 00 00 ......$......... Jul 31 09:51:11 darkstar kernel: [ 2024.960036] Object 0xf6f3a0c0: 07 06 43 4e 20 01 0d 14 2a 01 00 32 04 30 48 60 ..CN....*..2.0H` Jul 31 09:51:11 darkstar kernel: [ 2024.960036] Object 0xf6f3a0d0: 6c dd 18 00 17 7b 01 04 00 00 00 01 00 00 00 10 lÝ...{.......... Jul 31 09:51:11 darkstar kernel: [ 2024.960036] Redzone 0xf6f3b060: bb bb bb bb »»»» Jul 31 09:51:11 darkstar kernel: [ 2024.960036] Padding 0xf6f3b088: 5a 5a 5a 5a 5a 5a 5a 5a ZZZZZZZZ Jul 31 09:51:11 darkstar kernel: [ 2024.960036] Pid: 0, comm: swapper Tainted: G W 2.6.26-smp #2 Jul 31 09:51:11 darkstar kernel: [ 2024.960036] [<c0180f5d>] print_trailer+0xad/0xf0 Jul 31 09:51:11 darkstar kernel: [ 2024.960036] [<c018103b>] check_bytes_and_report+0x9b/0xc0 Jul 31 09:51:11 darkstar kernel: [ 2024.960036] [<c018145e>] check_object+0x19e/0x1e0 Jul 31 09:51:11 darkstar kernel: [ 2024.960036] [<c01821a4>] __slab_alloc+0x454/0x4f0 Jul 31 09:51:11 darkstar kernel: [ 2024.960036] [<c01834d6>] __kmalloc_track_caller+0xe6/0xf0 Jul 31 09:51:11 darkstar kernel: [ 2024.960036] [<c03dd1ec>] ? dev_alloc_skb+0x1c/0x30 Jul 31 09:51:11 darkstar kernel: [ 2024.960036] [<c03dd1ec>] ? dev_alloc_skb+0x1c/0x30 Jul 31 09:51:11 darkstar kernel: [ 2024.960036] [<c03dce79>] __alloc_skb+0x49/0x100 Jul 31 09:51:11 darkstar kernel: [ 2024.960036] [<c03dd1ec>] dev_alloc_skb+0x1c/0x30 Jul 31 09:51:11 darkstar kernel: [ 2024.960036] [<f8a58599>] ath5k_rxbuf_setup+0x39/0x200 [ath5k] Jul 31 09:51:11 darkstar kernel: [ 2024.960036] [<f8a5a697>] ath5k_tasklet_rx+0x127/0x5c0 [ath5k] Jul 31 09:51:11 darkstar kernel: [ 2024.960036] [<c014969a>] ? print_lock_contention_bug+0x1a/0xe0 Jul 31 09:51:11 darkstar kernel: [ 2024.960036] [<c012eafc>] tasklet_action+0x4c/0xc0 Jul 31 09:51:11 darkstar kernel: [ 2024.960036] [<c012e463>] __do_softirq+0x93/0x120 Jul 31 09:51:11 darkstar kernel: [ 2024.960036] [<c012e547>] do_softirq+0x57/0x60 Jul 31 09:51:11 darkstar kernel: [ 2024.960036] [<c012ea29>] irq_exit+0x69/0x80 Jul 31 09:51:11 darkstar kernel: [ 2024.960036] [<c0106b55>] do_IRQ+0x45/0x80 Jul 31 09:51:11 darkstar kernel: [ 2024.960036] [<c010a5d0>] ? mwait_idle+0x0/0x50 Jul 31 09:51:11 darkstar kernel: [ 2024.960036] [<c0104752>] common_interrupt+0x2e/0x34 Jul 31 09:51:11 darkstar kernel: [ 2024.960036] [<c010a5d0>] ? mwait_idle+0x0/0x50 Jul 31 09:51:11 darkstar kernel: [ 2024.960036] [<c010a609>] ? mwait_idle+0x39/0x50 Jul 31 09:51:11 darkstar kernel: [ 2024.960036] [<c01026e0>] cpu_idle+0x60/0xd0 Jul 31 09:51:11 darkstar kernel: [ 2024.960036] [<c043c8ce>] rest_init+0x4e/0x60 Jul 31 09:51:11 darkstar kernel: [ 2024.960036] ======================= The syslog detail please see the attachment. -- Regards dave
Attachment:
dmesg.gz
Description: GNU Zip compressed data