I wrote on 2008-01-13: > James Bottomley wrote: >> Firewire list cc'd >>> Jan 12 16:50:49 x3400 kernel: firewire_sbp2: orb reply timed out, rcode=0x11 >>> Jan 12 16:50:49 x3400 kernel: sd 11:0:0:0: [sdc] Result: hostbyte=DID_BUS_BUSY >>> driverbyte=DRIVER_OK,SUGGEST_OK >> Best I can tell, this is the source of the problem. The sbp2 driver is >> replying DID_BUS_BUSY until that gets sorted out, which seems to be >> never. > > When something was plugged in or out at the same bus, fw-sbp2 has to > reconnect == renew the login to each logical unit. The syslog in the > report is inconclusive whether that happened or failed. In any case, there are frequently commands retried or newly enqueued while fw-sbp2 waits to get the login renewed. (And fw-sbp2 continues to complete them with DID_BUS_BUSY until the reconnection didn't succeed. Whoever caused that I/O, e.g. dd like in the reporter's and my own tests, will quickly fail. > As a side note, the old sbp2 driver does not quit commands with > DID_BUS_BUSY between bus reset and reconnect. Instead it blocks the > Scsi_Host in order to not receive commands during that time at all. I experimented with this yesterday. First I tried scsi_internal_device_block() because we - want to block logical units individually if possible, - need to block from within atomic context (softirq context). However, this failed miserably with all sorts of lock inversion bug backtraces (alleged ones or real ones, I don't know) and with occasional kernel lock-ups (so it were probably real lock inversions). These locking issues cannot be solved easily because block layer and scsi_lib play nauseating games with their locks. So, I switched over to scsi_block_requests(), i.e. blocking the whole host like the old sbp2 driver does. This doesn't seem to have scsi_internal_device_block()'s locking issues. However, the sbp2 driver has one Scsi_Host for each logical unit while the new fw-sbp2 driver however has one Scsi_Host for each target. Hence there are difficulties with targets with multiple logical units, but I probably got them sorted out now. There remain frequent problems with reconnection + re-login failures though. These failures don't happen with exactly the same bus topology if I don't run I/O during the bus resets. I have an idea though what to try next... -- Stefan Richter -=====-==--- --=- ---== http://arcgraph.de/sr/ - To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html