> -----Original Message----- > From: Hannes Reinecke [mailto:hare@xxxxxxx] > Sent: July-16-15 7:11 AM > > When the hang occurs shost->host_busy == 2 and shost->host_failed == 1 > > in the scsi_eh_wakeup function. However this function only wakes the > > error handler if host_busy == host_failed. > > > Which just means that one command is still outstanding, and we need to wait > for it to complete. > But see below... So the root cause of the hang is maybe that the second command never completes? Maybe host_failed being non zero is blocking something in the port multiplier code? > Hmm. > I am really not sure about this. I wasn't sure either, that is one reason why I posted the patch. > 'host_busy' indicates the number of outstanding commands, and > 'host_failed' is the number of commands which have failed (on the ground > that failed commands are considered outstanding, too). > > So the first hunk would change the behaviour from 'start SCSI EH once all > commands are completed or failed' to 'start SCSI EH for _any_ command if > scsi_eh_wakeup is called' > (note that shost_failed might be '0'...). > Which doesn't sound right. So could the patch create any problems by starting the EH any time scsi_eh_wakeup is called? Or is it is just inefficient? > I guess this needs further debugging to get to the bottom of it. Any suggestions on things I could try? The fact that the problem goes away when I only enable one CPU core makes me think there is a race happening somewhere. Kevin -- To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html