On Wed, Nov 17, 2010 at 04:35:52PM +0200, Boaz Harrosh wrote: > On 11/17/2010 04:10 PM, Christof Schmitt wrote: > > On Wed, Nov 17, 2010 at 03:43:34PM +0200, Boaz Harrosh wrote: > >> On 11/17/2010 03:23 PM, Christof Schmitt wrote: > >>> From: Christof Schmitt <christof.schmitt@xxxxxxxxxx> > >>> > >>> Interrupting the connection to the FCP channel while I/O requests are > >>> being issues can lead to this deadlock. scsi_dispatch_cmd already > >>> holds the host_lock while the recovery trigger tries to acquire the > >>> host_lock again when iterating through the scsi_devices. > >>> > >>> INFO: lockdep is turned off. > >>> BUG: spinlock lockup on CPU#1, blast/9660, 0000000078f38878 > >>> CPU: 1 Not tainted 2.6.35.7SWEN2 #2 > >>> Process blast (pid: 9660, task: 0000000071f75940, ksp: 0000000074393ac0) > >>> 0000000074393640 00000000743935c0 0000000000000002 0000000000000000 > >>> 0000000074393660 00000000743935d8 00000000743935d8 00000000005590c2 > >>> 0000000000000000 0000000078f38878 0000000026ede800 0000000078f38878 > >>> 000000000000000d 040000000000000c 0000000074393628 0000000000000000 > >>> 0000000000000000 0000000000100b2a 00000000743935c0 0000000074393600 > >>> Call Trace: > >>> ([<0000000000100a32>] show_trace+0xee/0x144) > >>> [<00000000003be202>] do_raw_spin_lock+0x112/0x178 > >>> [<000000000055d408>] _raw_spin_lock_irqsave+0x90/0xb0 > >>> [<00000000003f1514>] __scsi_iterate_devices+0x38/0xbc > >>> [<00000000004849b0>] zfcp_erp_clear_adapter_status+0xd0/0x16c > >>> [<000000000048587a>] zfcp_erp_adapter_reopen+0x3a/0xb4 > >>> [<0000000000489812>] zfcp_fsf_req_send+0x166/0x180 > >>> [<000000000048c8d6>] zfcp_fsf_fcp_cmnd+0x272/0x408 > >>> [<000000000048f864>] zfcp_scsi_queuecommand+0x11c/0x1e0 > >>> [<00000000003f1f2a>] scsi_dispatch_cmd+0x1d6/0x324 > >>> [<00000000003f9910>] scsi_request_fn+0x42c/0x56c > >>> [<00000000003828ae>] __blk_run_queue+0x86/0x140 > >>> [<000000000037f742>] elv_insert+0x11a/0x208 > >>> [<000000000038104c>] blk_insert_cloned_request+0x84/0xe4 > >>> [<000003c0032b7c64>] dm_dispatch_request+0x6c/0x94 [dm_mod] > >>> [<000003c0032b7d5c>] map_request+0xd0/0x100 [dm_mod] > >>> [<000003c0032b9a78>] dm_request_fn+0xec/0x1bc [dm_mod] > >>> [<0000000000382c0e>] generic_unplug_device+0x5a/0x6c > >>> [<000003c0032b7f98>] dm_unplug_all+0x74/0x9c [dm_mod] > >>> [<00000000001d1272>] sync_page+0x76/0x9c > >>> [<00000000001d12ba>] sync_page_killable+0x22/0x60 > >>> [<000000000055a768>] __wait_on_bit_lock+0xc0/0x124 > >>> [<00000000001d1140>] __lock_page_killable+0x78/0x84 > >>> [<00000000001d351c>] generic_file_aio_read+0x5a4/0x7e8 > >>> [<0000000000228ec0>] do_sync_read+0xc8/0x12c > >>> [<0000000000229edc>] vfs_read+0xac/0x1ac > >>> [<000000000022a0d8>] SyS_read+0x58/0xa8 > >>> [<00000000001146de>] sysc_noemu+0x10/0x16 > >>> [<00000200000493c4>] 0x200000493c4 > >>> INFO: lockdep is turned off. > >>> > >>> Call zfcp_fsf_fcp_cmnd without the host_lock and disable the > >>> interrupts when acquiring the req_q_lock. > >>> > >>> Reviewed-by: Swen Schillig <swen@xxxxxxxxxxxx> > >>> Signed-off-by: Christof Schmitt <christof.schmitt@xxxxxxxxxx> > >>> --- > >>> > >>> drivers/s390/scsi/zfcp_fsf.c | 5 +++-- > >>> drivers/s390/scsi/zfcp_scsi.c | 3 +++ > >>> 2 files changed, 6 insertions(+), 2 deletions(-) > >>> > >>> --- a/drivers/s390/scsi/zfcp_fsf.c > >>> +++ b/drivers/s390/scsi/zfcp_fsf.c > >>> @@ -2170,12 +2170,13 @@ int zfcp_fsf_fcp_cmnd(struct scsi_cmnd * > >>> struct zfcp_adapter *adapter = zfcp_sdev->port->adapter; > >>> struct zfcp_qdio *qdio = adapter->qdio; > >>> struct fsf_qtcb_bottom_io *io; > >>> + unsigned long flags; > >>> > >>> if (unlikely(!(atomic_read(&zfcp_sdev->status) & > >>> ZFCP_STATUS_COMMON_UNBLOCKED))) > >>> return -EBUSY; > >>> > >>> - spin_lock(&qdio->req_q_lock); > >>> + spin_lock_irqsave(&qdio->req_q_lock, flags); > >>> if (atomic_read(&qdio->req_q_free) <= 0) { > >>> atomic_inc(&qdio->req_q_full); > >>> goto out; > >>> @@ -2239,7 +2240,7 @@ failed_scsi_cmnd: > >>> zfcp_fsf_req_free(req); > >>> scsi_cmnd->host_scribble = NULL; > >>> out: > >>> - spin_unlock(&qdio->req_q_lock); > >>> + spin_unlock_irqrestore(&qdio->req_q_lock, flags); > >>> return retval; > >>> } > >>> > >>> --- a/drivers/s390/scsi/zfcp_scsi.c > >>> +++ b/drivers/s390/scsi/zfcp_scsi.c > >>> @@ -83,6 +83,7 @@ static int zfcp_scsi_queuecommand_lck(st > >>> struct zfcp_adapter *adapter = zfcp_sdev->port->adapter; > >>> struct fc_rport *rport = starget_to_rport(scsi_target(scpnt->device)); > >>> int status, scsi_result, ret; > >>> + struct scsi_device *sdev = scpnt->device; > >>> > >>> /* reset the status for this request */ > >>> scpnt->result = 0; > >>> @@ -118,7 +119,9 @@ static int zfcp_scsi_queuecommand_lck(st > >>> return 0; > >>> } > >>> > >>> + spin_unlock_irq(sdev->host->host_lock); > >>> ret = zfcp_fsf_fcp_cmnd(scpnt); > >>> + spin_lock_irq(sdev->host->host_lock); > >> > >> CCing Jeff > >> > >> that locks is taken in your own driver three lines below at the > >> DEF_SCSI_QCMD macro invocation > >> > >> Please do the proper host-lock-removal. The first time you are > >> touching this code. (See example patch to libata by Jeff Garzik) > > > > With the current code, the serial_number has to be updated for each > > command since scsi_error still has the check for the serial_number in > > scsi_try_to_abort_cmd. The change above is a bug fix for the zfcp > > changes introduced in 2.6.37-rc1, and i would like to fix this now. I > > remember from the host_lock discussion that the serial_number changes > > will happen in the 2.6.38 kernel. To me, it looks like there a two > > changes needed anyway. > > > > Sigh! yes this is a deficiency I don't like about the current > push-down. I think we could do Nic's atomic serial_number as an > intermediate solution, or the pending fix to scsi_eh, and not force > a double change to every driver like we have now. > > But I still think it's better to open code DEF_SCSI_QCMD than > to have that ugly > unlock > ... > lock > > That looks like an ugly typo (And is slower actually) > > Jeff ? > Boaz I have just seen the patch "[PATCH] Eliminate error handler overload of the SCSI serial number" from James. This means, i don't have to worry about the serial_number at all. I will send a new patch that removes the host_lock from zfcp to fix the locking issue in zfcp. Christof > > > Either > > - make the above change now (as bug fix) > > - remove the host_lock from zfcp's queuecommand function when the > > serial_number becomes optional > > or > > - change the queuecommand function now to include: > > take host_lock, call scsi_cmd_get_serial, release host_lock > > - remove the sequence again when the serial_number becomes optional > > > > I opted for the first approach, to have a smaller patch now. If the > > second approach is preferred, i can send an updated patch. > > > > Christof > > -- > > To unsubscribe from this list: send the line "unsubscribe linux-scsi" in > > the body of a message to majordomo@xxxxxxxxxxxxxxx > > More majordomo info at http://vger.kernel.org/majordomo-info.html > > > > -- > To unsubscribe from this list: send the line "unsubscribe linux-scsi" in > the body of a message to majordomo@xxxxxxxxxxxxxxx > More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html