Possible Bug in 3.8.0rc4 kernel

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi Everyone,

Under head load I've managed to produce a kernel panic but I'm having trouble tracking down the cause.  It mentions a few things.
I have a diagram of my server layout here: http://ceph.com/wp-content/uploads/2012/12/2012-12-10_13-51-31.png
The LIO portion is sitting on dlcephproxy01a and dlcephproxy01b.  They are not load balanced.  They are setup for failover support.  If one fails vmware should move to the other proxy machine.  I noticed that when this kernel panic happens on one machine and vmware switches over to the other it fails almost immediately.  I haven't been able to capture the kernel messages from that failure though.  LIO has ceph block devices mounted on both proxy servers with the same uuid so vmware sees them as one disk.  I have one proxy per fabric, A and B.


Here's the kernel messages when it started:

21084.228438] ABORT_TASK: Sending TMR_TASK_DOES_NOT_EXIST for ref_tag: 1159432
[21084.228518] ABORT_TASK: Sending TMR_FUNCTION_COMPLETE for ref_tag: 1188824
[21084.228522] ABORT_TASK: Sending TMR_TASK_DOES_NOT_EXIST for ref_tag: 1194632
[21084.228525] ABORT_TASK: Sending TMR_TASK_DOES_NOT_EXIST for ref_tag: 1149312
[21084.228527] ABORT_TASK: Found referenced qla2xxx task_tag: 1152172
[21084.235317] ABORT_TASK: Sending TMR_TASK_DOES_NOT_EXIST for ref_tag: 1159520
[21084.236891] ABORT_TASK: Sending TMR_TASK_DOES_NOT_EXIST for ref_tag: 1159564
[21084.238471] ABORT_TASK: Sending TMR_TASK_DOES_NOT_EXIST for ref_tag: 1159608
[21084.240043] ABORT_TASK: Sending TMR_TASK_DOES_NOT_EXIST for ref_tag: 1159652
[21084.241614] ABORT_TASK: Found referenced qla2xxx task_tag: 1157540
[21084.496096] ------------[ cut here ]------------
[21084.497830] Kernel BUG at ffffffff8107a4af [verbose debug info unavailable]
[21084.499699] invalid opcode: 0000 [#1] SMP
[21084.500064] Modules linked in: netconsole target_core_pscsi target_core_file target_core_iblock ib_srpt tcm_qla2xxx tcm_loop tcm_fc iscsi_target_mod target_core_mod qla2xxx ib_cm ib_sa ib_mad ib_core libfc ext2 bonding radeon coretemp kvm ttm gpio_ich drm_kms_helper i5000_edac drm edac_core psmouse i5k_amb ipmi_si lpc_ich microcode ipmi_msghandler configfs i2c_algo_bit shpchp mac_hid serio_raw hpilo hpwdt rbd libceph lp parport hid_generic usbhid hid btrfs zlib_deflate libcrc32c hpsa scsi_transport_fc bnx2 cciss scsi_tgt [last unloaded: qla2xxx]
[21084.500064] CPU 0
[21084.500064] Pid: 3914, comm: kworker/u:4 Not tainted 3.8.0-030800rc4-generic #201301172335 HP ProLiant DL360 G5
[21084.500064] RIP: 0010:[<ffffffff8107a4af>]  [<ffffffff8107a4af>] __cancel_work_timer+0x8f/0xa0
[21084.500064] RSP: 0018:ffff880126337ce8  EFLAGS: 00010246
[21084.500064] RAX: 0000000000000200 RBX: ffff880124b45118 RCX: 0000000000000000
[21084.500064] RDX: 0000000000000001 RSI: ffff8801248e1740 RDI: ffff880126337cc0
[21084.500064] RBP: ffff880126337d18 R08: ffff880126336000 R09: 0000000000000001
[21084.500064] R10: 0000000000000020 R11: 0000000000000001 R12: 0000000000000000
[21084.500064] R13: 0000000000000000 R14: ffff880125e2c680 R15: ffff88012805c5a0
[21084.544012] ABORT_TASK: Sending TMR_FUNCTION_COMPLETE for ref_tag: 1160576
[21084.544036] ABORT_TASK: Found referenced qla2xxx task_tag: 1140512
[21084.500064] FS:  0000000000000000(0000) GS:ffff88012fc00000(0000) knlGS:0000000000000000
[21084.500064] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
[21084.500064] CR2: 0000000000a3e000 CR3: 0000000125d1d000 CR4: 00000000000007f0
[21084.500064] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[21084.500064] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
[21084.500064] Process kworker/u:4 (pid: 3914, threadinfo ffff880126336000, task ffff8801248e1740)
[21084.500064] Stack:
[21084.500064]  0000000000000286 0000000000000296 ffffffff816eb319 ffff880124b44f90
[21084.500064]  ffff880124b45028 000000000011a9a4 ffff880126337d28 ffffffff8107a4f0
[21084.500064]  ffff880126337d98 ffffffffa008ddaa 0000000000000082 ffff880127381000
[21084.500064] Call Trace:
[21084.500064]  [<ffffffff816eb319>] ? _raw_spin_unlock_irqrestore+0x19/0x30
[21084.500064]  [<ffffffff8107a4f0>] cancel_work_sync+0x10/0x20
[21084.500064]  [<ffffffffa008ddaa>] core_tmr_abort_task+0x17a/0x240 [target_core_mod]
[21084.500064]  [<ffffffffa008f9ff>] target_tmr_work+0xcf/0xf0 [target_core_mod]
[21084.500064]  [<ffffffff81078cd0>] process_one_work+0x130/0x480
[21084.632006]  [<ffffffffa008f930>] ? transport_cmd_check_stop+0x160/0x160 [target_core_mod]
[21084.632006]  [<ffffffff81079a07>] worker_thread+0x167/0x400
[21084.632006]  [<ffffffff810798a0>] ? manage_workers+0x120/0x120
[21084.632006]  [<ffffffff8107ef20>] kthread+0xc0/0xd0
[21084.632006]  [<ffffffff8107ee60>] ? flush_kthread_worker+0xb0/0xb0
[21084.632006]  [<ffffffff816f3f6c>] ret_from_fork+0x7c/0xb0
[21084.632006]  [<ffffffff8107ee60>] ? flush_kthread_worker+0xb0/0xb0
[21084.632006] Code: 66 90 66 90 48 89 df e8 e0 fe ff ff 48 8b 03 a8 01 74 18 45 85 ed 48 c7 03 00 02 02 00 0f 95 c0 48 83 c4 18 5b 41 5c 41 5d 5d c3 <0f> 0b 48 89 df e8 b7 fe ff ff e9 7a ff ff ff 66 90 66 66 66 66
[21084.632006] RIP  [<ffffffff8107a4af>] __cancel_work_timer+0x8f/0xa0
[21084.632006]  RSP <ffff880126337ce8>
[21084.698021] ABORT_TASK: Sending TMR_FUNCTION_COMPLETE for ref_tag: 1152172
[21084.707218] ABORT_TASK: Sending TMR_TASK_DOES_NOT_EXIST for ref_tag: 1153580
[21084.723240] ABORT_TASK: Sending TMR_TASK_DOES_NOT_EXIST for ref_tag: 1153272
[21084.734054] ---[ end trace 8e5f870637161349 ]---
[21084.734104] BUG: unable to handle kernel paging request at ffffffffffffffd8
[21084.734111] IP: [<ffffffff8107f350>] kthread_data+0x10/0x20
[21084.734113] PGD 1c0f067 PUD 1c10067 PMD 0
[21084.734115] Oops: 0000 [#2] SMP

A little ways down the messages it starts to complain about a slowpath_common and names the network driver:
[21084.736010] WARNING: at /home/apw/COD/linux/net/core/skbuff.c:573 skb_release_head_state+0x10f/0x120()
[21084.736010]  <NMI>  [<ffffffff810599af>] warn_slowpath_common+0x7f/0xc0
[21084.736010]  [<ffffffff81059a0a>] warn_slowpath_null+0x1a/0x20
[21084.736010]  [<ffffffff8106218a>] local_bh_enable_ip+0x7a/0xa0
[21084.736010]  [<ffffffff816eb219>] _raw_spin_unlock_bh+0x19/0x20
[21084.736010]  [<ffffffffa002e986>] bnx2_reg_rd_ind+0x46/0x60 [bnx2]


Chris Holcombe
Unix Administrator
Corporation Service Company
cholcomb@xxxxxxxxxxx
302-636-8667

________________________________

NOTICE: This e-mail and any attachments is intended only for use by the addressee(s) named herein and may contain legally privileged, proprietary or confidential information. If you are not the intended recipient of this e-mail, you are hereby notified that any dissemination, distribution or copying of this email, and any attachments thereto, is strictly prohibited. If you receive this email in error please immediately notify me via reply email or at (800) 927-9800 and permanently delete the original copy and any copy of any e-mail, and any printout.
--
To unsubscribe from this list: send the line "unsubscribe target-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[Index of Archives]     [Linux SCSI]     [Kernel Newbies]     [Linux SCSI Target Infrastructure]     [Share Photos]     [IDE]     [Security]     [Git]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux ATA RAID]     [Linux IIO]     [Device Mapper]

  Powered by Linux