On 04/06/2010 05:14 AM, Ravi Anand wrote:
@@ -888,7 +890,7 @@ static void qla4xxx_flush_active_srbs(struct scsi_qla_host *ha) srb = qla4xxx_del_from_active_array(ha, i); if (srb != NULL) { srb->cmd->result = DID_RESET<< 16; - qla4xxx_srb_compl(ha, srb); + kref_put(&srb->srb_ref, qla4xxx_srb_compl); }
* * This routine waits for the command to be returned by the Firmware @@ -1507,17 +1509,14 @@ static int qla4xxx_eh_wait_on_command(struct scsi_qla_host *ha, struct scsi_cmnd *cmd) { int done = 0; - struct srb *rp; uint32_t max_wait_time = EH_WAIT_CMD_TOV; do { /* Checking to see if its returned to OS */ - rp = (struct srb *) cmd->SCp.ptr; - if (rp == NULL) { + if (cmd->host_scribble == NULL) { done++; break; }
Is this racey? It seems like qla4xxx_flush_active_srbs will call qla4xxx_del_from_active_array and that will clear host_scribble, and here we will see it is NULL and proceed to complete the eh abort process. When we get back to qla4xxx_eh_abort we could call kref_put(&srb->srb_ref, qla4xxx_srb_compl) and then return SUCCESS to the scsi eh. The scsi eh will then begin to reuse the scsi cmd to send a TUR. But I think the problem could be that if happens before qla4xxx_flush_active_srbs has called its kref_put, then when qla4xxx_flush_active_srbs now calls it it is going to call qla4xxx_srb_compl and the driver will be completing a completely different command (it would now be the TUR).
I thought I had said this before, but it looks like I did not get a reply. I think you want to drop the kref coming into qla4xxx_eh_wait_on_command because nothing is touching the srb now. And then you want to add some locking in around the host_scribble checking maybe.
-- To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html