Hannes, I rebuilt a kernel with your patch on a 2.6.28-rc8 kernel and hit the same panic as before. The attached trace is for your reference. Thanks. Harris -----Original Message----- From: Hannes Reinecke [mailto:hare@xxxxxxx] Sent: Wednesday, December 17, 2008 1:33 AM To: Shi, Harris Cc: malahal@xxxxxxxxxx; Mike Anderson; SCSI development list Subject: Re: question on block-layer timeout change Hi Harris, Shi, Harris wrote: > Hannes, > > Just let you know that the same panic is still there in SLES11RC1. Philip should > be able to help us to report it in Novell Bugzilla. > Ah. It might be related to the wrong scsi_device_online() check (cf my patch 'Check for deleted device in scsi_device_online()' earlier on this list). Problem is that the error handler just checks with scsi_device_online() if a command can be sent down the wire. And so for a failed/deleted devices the error handler just barfs at random places. Can you check if this patch resolves the issue? Cheers, Hannes -- Dr. Hannes Reinecke zSeries & Storage hare@xxxxxxx +49 911 74053 688 SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 Nürnberg GF: Markus Rex, HRB 16746 (AG Nürnberg)
Information from /var/log/messages: =================================== Dec 17 15:58:14 timon kernel: sd 6:0:0:2: [sdd] Sense Key : Recovered Error [current] Dec 17 15:58:14 timon kernel: sd 6:0:0:2: [sdd] <<vendor>> ASC=0x95 ASCQ=0x1ASC=0x95 ASCQ=0x1 Dec 17 15:58:25 timon kernel: connection2:0: ping timeout of 15 secs expired, last rx 19237, last ping 20487, now 24237 Dec 17 15:58:25 timon kernel: connection2:0: detected conn error (1011) Dec 17 15:58:26 timon iscsid: Kernel reported iSCSI connection 2:0 error (1011) state (3) Information from Serial output: =============================== Oops: 0002 [#1] SMP last sysfs file: /sys/devices/system/cpu/cpu3/cache/index1/shared_cpu_map Modules linked in: radeon drm agpgart crc32c libcrc32c ib_iser rdma_cm ib_cm nfs iw_cm lockd ib_sa ib_mad nfs_acl ib_core i6 IP: [<c011a274>] __ticket_spin_lock+0x8/0x19 *pdpt = 00000000319fe001 *pde = 0000000000000000 BUG: unable to handle kernel NULL pointer dereference at 00000086 IP: [<c011a274>] __ticket_spin_lock+0x8/0x19 *pdpt = 0000000000546001 *pde = 0000000000000000 ipv6 af_packet microcode fuse loop dm_mod mptctl e1000 iTCO_wdt sr_mod video iTCO_vendor_support e752x_edac output shpchp ] Pid: 0, comm: swapper Not tainted (2.6.28-rc8-test-1-pae #1) PowerEdge 2850 EIP: 0060:[<c011a274>] EFLAGS: 00010086 CPU: 3 EIP is at __ticket_spin_lock+0x8/0x19 EAX: 00000086 EBX: f10f6380 ECX: f20b5400 EDX: 00000100 ESI: f18223b0 EDI: 00000000 EBP: f38a5e78 ESP: f38a5e78 DS: 007b ES: 007b FS: 00d8 GS: 0000 SS: 0068 Process swapper (pid: 0, ti=f38a4000 task=f38a2fd0 task.ti=f38a4000) Stack: f38a5e80 c0328e0f f38a5e98 f9298389 00000002 f10f6380 f18223b0 00000000 f38a5ea4 f7e13396 f11b9300 f38a5eb0 c0212539 f11b9300 f38a5ed4 c02125f2 f18225b8 00000282 f389c000 f18224f4 00000100 f389c000 c0212573 f38a5f08 Call Trace: [<c0328e0f>] ? _spin_lock+0x15/0x18 [<f9298389>] ? iscsi_eh_cmd_timed_out+0x24/0xb0 [libiscsi] [<f7e13396>] ? scsi_times_out+0x35/0x61 [scsi_mod] [<c0212539>] ? blk_rq_timed_out+0xc/0x46 [<c02125f2>] ? blk_rq_timed_out_timer+0x7f/0xf6 [<c0212573>] ? blk_rq_timed_out_timer+0x0/0xf6 [<c013744c>] ? run_timer_softirq+0x154/0x1c7 [<c0133ab0>] ? __do_softirq+0x8d/0x133 [<c0133b9e>] ? do_softirq+0x48/0x57 [<c0133cad>] ? irq_exit+0x38/0x6d [<c0112e8d>] ? smp_apic_timer_interrupt+0x71/0x7f [<c0105f78>] ? apic_timer_interrupt+0x28/0x30 [<c010a75d>] ? mwait_idle+0x32/0x40 [<c0103c36>] ? cpu_idle+0x74/0x8e [<c0324407>] ? start_secondary+0x269/0x26e Code: fe ff ff 5b eb 13 56 0f b7 d2 ff 75 08 89 d9 0f b6 c0 e8 44 fe ff ff 5a 59 8d 65 f8 5b 5e 5d c3 90 90 90 55 ba 00 01 EIP: [<c011a274>] __ticket_spin_lock+0x8/0x19 SS:ESP 0068:f38a5e78 Oops: 0002 [#2] <0>Kernel panic - not syncing: Fatal exception in interrupt ------------[ cut here ]------------ WARNING: at kernel/smp.c:333 smp_call_function_mask+0x27/0x1b6() Modules linked in: radeon drm agpgart crc32c libcrc32c ib_iser rdma_cm ib_cm nfs iw_cm lockd ib_sa ib_mad nfs_acl ib_core i] Pid: 0, comm: swapper Tainted: G D 2.6.28-rc8-test-1-pae #1 Call Trace: [<c0326a95>] ? printk+0xf/0x12 [<c012f173>] warn_on_slowpath+0x41/0x63 [<c02207fc>] ? __const_udelay+0x2c/0x2e [<c02207fc>] ? __const_udelay+0x2c/0x2e [<c028afc1>] ? wait_for_xmitr+0x37/0x7d [<c028afc1>] ? wait_for_xmitr+0x37/0x7d [<c014d326>] smp_call_function_mask+0x27/0x1b6 [<c011a346>] ? default_spin_lock_flags+0x15/0x1b [<c011a346>] ? default_spin_lock_flags+0x15/0x1b [<c0143b34>] ? down_trylock+0x20/0x29 [<c012f4ea>] ? try_acquire_console_sem+0xd/0x2e [<c0157040>] ? crash_kexec+0x9f/0xa7 [<c014d4d3>] smp_call_function+0x1e/0x25 [<c0111980>] ? stop_this_cpu+0x0/0x51 [<c0111956>] native_smp_send_stop+0x1b/0x45 [<c03269ef>] panic+0x48/0xdf [<c0329a1b>] oops_end+0x8f/0xa3 [<c01074c7>] die+0x5d/0x65 [<c032b422>] do_page_fault+0x9ef/0xaf3 [<c0124b8a>] ? enqueue_task_fair+0x1f/0x59 [<c012120e>] ? resched_task+0x30/0x74 [<c0128a9c>] ? try_to_wake_up+0x216/0x221 [<c0128ab2>] ? default_wake_function+0xb/0xd [<c01403f2>] ? autoremove_wake_function+0xf/0x33 [<c014044c>] ? wake_bit_function+0x36/0x43 [<c0120b54>] ? __wake_up_common+0x35/0x5b [<c016e4dc>] ? mempool_free_slab+0xe/0x10 [<c0124b63>] ? enqueue_entity+0x297/0x29f [<c0124b8a>] ? enqueue_task_fair+0x1f/0x59 [<c012057e>] ? enqueue_task+0x4c/0x58 [<c012120e>] ? resched_task+0x30/0x74 [<c032aa33>] ? do_page_fault+0x0/0xaf3 [<c0329132>] error_code+0x72/0x80 [<c014007b>] ? kthreadd+0xa8/0x156 [<c011a274>] ? __ticket_spin_lock+0x8/0x19 [<c0328e0f>] _spin_lock+0x15/0x18 [<f9298389>] iscsi_eh_cmd_timed_out+0x24/0xb0 [libiscsi] [<f7e13396>] scsi_times_out+0x35/0x61 [scsi_mod] [<c0212539>] blk_rq_timed_out+0xc/0x46 [<c02125f2>] blk_rq_timed_out_timer+0x7f/0xf6 [<c0212573>] ? blk_rq_timed_out_timer+0x0/0xf6 [<c013744c>] run_timer_softirq+0x154/0x1c7 [<c0133ab0>] __do_softirq+0x8d/0x133 [<c0133b9e>] do_softirq+0x48/0x57 [<c0133cad>] irq_exit+0x38/0x6d [<c0112e8d>] smp_apic_timer_interrupt+0x71/0x7f [<c0105f78>] apic_timer_interrupt+0x28/0x30 [<c010a75d>] ? mwait_idle+0x32/0x40 [<c0103c36>] cpu_idle+0x74/0x8e [<c0324407>] start_secondary+0x269/0x26e ---[ end trace d93b1e584c414659 ]---