Hi All, I used qla initiator driver based on 2.6.23-8 which was modified to support target mode and virtual port on AL topology. The issue does not related to the target. As I wrote in other thread it happens also when in one shell insmod'ing the driver and in other shell rmmod'ing it. My Oops is pointing as Andrew stated to the qla24xx_report_id_acquisition() routine, because after loading the driver I created vport, then rmmod'ing the driver without deleting the vport. In that case interrupt comes to report about the deletion of the vport, and request the dpc thread. So for sure we need to call qla2xxx_wake_dpc() routine instead of calling wake_up_process() routine, where we don't check the dpc_thread pointer. I am not sure this is enough but I can test it, because as I wrote again in other thread, and Vlad quote here, if someone request wake up of the dpc thread, check that it is not NULL, call to wake_up_process() routine, and exactly at this point other task got the CPU, set the pointer to NULL and stop the dpc thread, then we have a problem. Below is the Oops with Vlad fix revision 470 in SCST with the addition of the lock but without the change in qla_mbx.c. [67324.049497] [4525]: scst: exit_scst:1652:SCST unloaded [67324.765692] qla2xxx 0000:06:00.1: LIP reset occured (f8ef). [67325.295690] qla2xxx 0000:06:00.1: LIP occured (f8ef). [67325.315689] qla2xxx 0000:06:00.1: LOOP UP detected (4 Gbps). [67325.415700] Unable to handle kernel NULL pointer dereference at 0000000000000008 RIP: [67325.416599] [<ffffffff80226b1e>] task_rq_lock+0x2e/0x90 [67325.418171] PGD 22d737067 PUD 22df2d067 PMD 0 [67325.419114] Oops: 0000 [1] SMP [67325.419736] CPU 0 [67325.420162] Modules linked in: qla2xxx loop scsi_transport_fc [67325.421333] Pid: 4382, comm: qla2xxx_12_dpc Tainted: GF 2.6.23.8-64bit-fc-nfs #10 [67325.422998] RIP: 0010:[<ffffffff80226b1e>] [<ffffffff80226b1e>] task_rq_lock+0x2e/0x90 [67325.424736] RSP: 0018:ffff81022e14bb00 EFLAGS: 00010086 [67325.425870] RAX: 0000000000000086 RBX: 000000000000000f RCX: 0000000000000000 [67325.427379] RDX: 0000000000000000 RSI: ffff81022e14bb80 RDI: 0000000000000000 [67325.428899] RBP: ffff81022e14bb20 R08: 0000000000000000 R09: ffff81022d160010 [67325.430406] R10: 0000000000000000 R11: ffffffff8050faf0 R12: ffffffff807a4000 [67325.431925] R13: 0000000000000000 R14: ffff81022e14bb80 R15: 0000000000000246 [67325.433433] FS: 0000000000000000(0000) GS:ffffffff8071d000(0000) knlGS:0000000000000000 [67325.435147] CS: 0010 DS: 0018 ES: 0018 CR0: 000000008005003b [67325.436371] CR2: 0000000000000008 CR3: 000000022dec0000 CR4: 00000000000006e0 [67325.437890] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [67325.439398] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 [67325.440921] Process qla2xxx_12_dpc (pid: 4382, threadinfo ffff81022e14a000, task ffff81022c6040c0) [67325.442813] Stack: 000000000000000f ffff81022d160000 0000000000000000 ffffc20000038000 [67325.444551] ffff81022e14bbb0 ffffffff80226ddf ffff81022e14bbd0 0000000000000046 [67325.446143] 0000000000000000 ffff810008abc918 000000002c6042e0 ffff81022c6040c0 [67325.447697] Call Trace: [67325.448252] [<ffffffff80226ddf>] try_to_wake_up+0x2f/0x370 [67325.449485] [<ffffffff80237854>] lock_timer_base+0x34/0x70 [67325.450736] [<ffffffff8805a9cb>] :qla2xxx:qla24xx_process_response_queue+0x18b/0x1f0 [67325.452442] [<ffffffff8805bdd8>] :qla2xxx:qla24xx_intr_handler +0x158/0x200 [67325.453976] [<ffffffff80237710>] process_timeout+0x0/0x10 [67325.455196] [<ffffffff88054b0b>] :qla2xxx:qla2x00_mailbox_command +0x1db/0x5d0 [67325.456787] [<ffffffff80597980>] thread_return+0x0/0x5c0 [67325.457978] [<ffffffff80237854>] lock_timer_base+0x34/0x70 [67325.459214] [<ffffffff88056a17>] :qla2xxx:qla2x00_get_adapter_id +0x87/0x130 [67325.460765] [<ffffffff80237710>] process_timeout+0x0/0x10 [67325.461983] [<ffffffff8804fa9f>] :qla2xxx:qla2x00_configure_loop +0x33f/0x17a0 [67325.463573] [<ffffffff80237854>] lock_timer_base+0x34/0x70 [67325.464807] [<ffffffff880568ea>] :qla2xxx:qla2x00_get_retry_cnt +0x5a/0x100 [67325.466333] [<ffffffff8805882c>] :qla2xxx:__qla2x00_marker +0xec/0x110 [67325.467781] [<ffffffff880588b0>] :qla2xxx:qla2x00_marker+0x60/0x90 [67325.469157] [<ffffffff88051b3e>] :qla2xxx:qla2x00_abort_isp +0x24e/0x610 [67325.470638] [<ffffffff8804c290>] :qla2xxx:qla2x00_do_dpc+0x430/0x560 [67325.472052] [<ffffffff8804be60>] :qla2xxx:qla2x00_do_dpc+0x0/0x560 [67325.473431] [<ffffffff8804be60>] :qla2xxx:qla2x00_do_dpc+0x0/0x560 [67325.474820] [<ffffffff80242e8b>] kthread+0x4b/0x80 [67325.475902] [<ffffffff8020c608>] child_rip+0xa/0x12 [67325.477005] [<ffffffff80242e40>] kthread+0x0/0x80 [67325.478069] [<ffffffff8020c5fe>] child_rip+0x0/0x12 [67325.479171] [67325.479486] [67325.479486] Code: 49 8b 45 08 4c 89 e3 8b 40 18 48 8b 04 c5 40 8c 74 80 48 03 [67325.481393] RIP [<ffffffff80226b1e>] task_rq_lock+0x2e/0x90 [67325.482617] RSP <ffff81022e14bb00> [67325.483356] CR2: 0000000000000008 Message from syslogd@HP1b at Tue Jul 22 02:25:48 2008 ... Message from syslogd@HP1b at Tue Jul 22 02:25:48 2008 ... HP1b kernel: [67325.419114] Oops: 0000 [1] SMP HP1b kernel: [67325.483356] CR2: 0000000000000008 Gal. On Mon, 2008-07-28 at 22:14 +0400, Vladislav Bolkhovitin wrote: > James Bottomley wrote: > > On Mon, 2008-07-28 at 21:33 +0400, Vladislav Bolkhovitin wrote: > >> This patch fixes race on dpc_thread field of struct scsi_qla_host, > >> which can lead to crash on the module unload. > >> > >> This patch is against 2.6.26 > > > > I'm afraid adding a lock is almost certainly the wrong way to handle > > this type of failure. > > Why? It's simple and fully solves the problem. All the events, which > left unhandled, because there is nobody to wake up by > qla2xxx_wake_dpc(), are not relevant after the driver's shutdown. > > > What should be done is to make sure the qla is > > correctly shut down (i.e. no tasks requiring the dpc_thread can be > > performed) *before* killing the thread ... > > Sure, in ideal it would be the best approach. But, certainly, it would > be a lot more complicated and error-prone. > > From other side, actually, it doesn't matter much for me how it will be > fixed, if it's fixed. > > > it sounds like shutdown is > > slightly broken in the current driver ... could you post the oops > > details and we can try to work out what the problem is > > Gal, can you send the details, please? > > > James > > > > > > > -- To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html