We recently saw this as well. It's related to the number of targets that go away simultaneously. There ends up being many delete rport items on the work queue, and when the 1st one stalls to flush the work queues, it starts the 2nd, which stops to flush, and so on. My inclination is to look at what we have on the work queue and see if we can circumvent some of the flush calls. -- james s Here's a backtrace: rport-4:0-37: blocked FC remote port time out: removing target and saving binding rport-4:0-42: blocked FC remote port time out: removing target and saving binding rport-4:0-55: blocked FC remote port time out: removing target and saving binding run_workqueue: recursion depth exceeded: 4 Call Trace:<ffffffff80146270>{flush_cpu_workqueue+96} <ffffffff8036a9b0>{_spin_lock_irqsave+32} <ffffffff80146473>{flush_workqueue+115} <ffffffff880a05b5>{:scsi_transport_fc:fc_rport_tgt_remove+85} <ffffffff801462e0>{flush_cpu_workqueue+208} <ffffffff880a0fc0>{:scsi_transport_fc:fc_timeout_deleted_rport+0} <ffffffff8036a9b0>{_spin_lock_irqsave+32} <ffffffff80146473>{flush_workqueue+115} <ffffffff880a05b5>{:scsi_transport_fc:fc_rport_tgt_remove+85} <ffffffff801462e0>{flush_cpu_workqueue+208} <ffffffff880a0fc0>{:scsi_transport_fc:fc_timeout_deleted_rport+0} <ffffffff880a0fc0>{:scsi_transport_fc:fc_timeout_deleted_rport+0} <ffffffff8036a9b0>{_spin_lock_irqsave+32} <ffffffff880a0fc0>{:scsi_transport_fc:fc_timeout_deleted_rport+0} <ffffffff80146473>{flush_workqueue+115} <ffffffff880a05b5>{:scsi_transport_fc:fc_rport_tgt_remove+85} <ffffffff8014617c>{worker_thread+508} <ffffffff8012f0e0>{default_wake_function+0} <ffffffff8012f0e0>{default_wake_function+0} <ffffffff80145f80>{worker_thread+0} <ffffffff8014a9c9>{kthread+217} <ffffffff8010edbe>{child_rip+8} <ffffffff8014a8f0>{kthread+0} <ffffffff8010edb6>{child_rip+0} rport-4:0-53: blocked FC remote port time out: removing target and saving binding run_workqueue: recursion depth exceeded: 5 Call Trace:<ffffffff80146270>{flush_cpu_workqueue+96} <ffffffff8036a9b0>{_spin_lock_irqsave+32} <ffffffff80146473>{flush_workqueue+115} <ffffffff880a05b5>{:scsi_transport_fc:fc_rport_tgt_remove+85} <ffffffff801462e0>{flush_cpu_workqueue+208} <ffffffff880a0fc0>{:scsi_transport_fc:fc_timeout_deleted_rport+0} <ffffffff8036a9b0>{_spin_lock_irqsave+32} <ffffffff80146473>{flush_workqueue+115} <ffffffff880a05b5>{:scsi_transport_fc:fc_rport_tgt_remove+85} <ffffffff801462e0>{flush_cpu_workqueue+208} <ffffffff880a0fc0>{:scsi_transport_fc:fc_timeout_deleted_rport+0} <ffffffff8036a9b0>{_spin_lock_irqsave+32} <ffffffff80146473>{flush_workqueue+115} <ffffffff880a05b5>{:scsi_transport_fc:fc_rport_tgt_remove+85} <ffffffff801462e0>{flush_cpu_workqueue+208} <ffffffff880a0fc0>{:scsi_transport_fc:fc_timeout_deleted_rport+0} <ffffffff880a0fc0>{:scsi_transport_fc:fc_timeout_deleted_rport+0} <ffffffff8036a9b0>{_spin_lock_irqsave+32} <ffffffff880a0fc0>{:scsi_transport_fc:fc_timeout_deleted_rport+0} <ffffffff80146473>{flush_workqueue+115} <ffffffff880a05b5>{:scsi_transport_fc:fc_rport_tgt_remove+85} <ffffffff8014617c>{worker_thread+508} <ffffffff8012f0e0>{default_wake_function+0} <ffffffff8012f0e0>{default_wake_function+0} <ffffffff80145f80>{worker_thread+0} <ffffffff8014a9c9>{kthread+217} <ffffffff8010edbe>{child_rip+8} <ffffffff8014a8f0>{kthread+0} <ffffffff8010edb6>{child_rip+0} ... <ffffffff8014a9c9>{kthread+217} <ffffffff8010edbe>{child_rip+8} <ffffffff8014a8f0>{kthread+0} <ffffffff8010edb6>{child_rip+0} rport-4:0-38: blocked FC remote port time out: removing target and saving binding run_workqueue: recursion depth exceeded: 30 -----Original Message----- From: Andrew Vasquez [mailto:andrew.vasquez@xxxxxxxxxx] Sent: Friday, December 02, 2005 11:29 AM To: Michael Reed; linux-scsi@xxxxxxxxxxxxxxx; Smart, James; Christoph Hellwig Subject: RE: 2.6.15-rc4 error messages with multiple qla2300 hba ports on fabric > From: Michael Reed [mailto:mdr@xxxxxxx] > Sidenote: I'm on the east-coast until hopefully tonight -- won't have a chance to look at debugging this for a couple of days... > I've been testing with the qla2300 driver with 2.6.14.3 and 2.6.15-rc4. > I've observed two sets of error messages which are not present with > 2.6.14.3. > > First, the qla2300 driver is generating soft lockups. Have a backtrace? > Second, several error messages indicating that remote > ports are being deleted are being emitted. > > rport-2:0-16: blocked FC remote port time out: removing target and saving binding > run_workqueue: recursion depth exceeded: 29 > > If the timing is just right, scsi errors are generated, though not evident > in the attached dmesg file. > > I've observed similar behavior with my modified mpt fusion driver > when multiple hba ports are on the fabric. The kernels tested > are as downloaded from kernel.org, without my mpt mods. > > (Andrew, I'm not "blaming" your driver for the rport issues. I chose > your driver to be the "victim" 'cause I didn't want to post this using > under development code with mpt fusion.) > > Platform: SGI Altix IA64. > > What additional information should I acquire? - : send the line "unsubscribe linux-scsi" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html