Re: [RFC PATCH 4/7] fc class: don't return from fc_block_scsi_eh until IO has been cleaned up

Mike Christie <michaelc@xxxxxxxxxxx> · Thu, 23 Sep 2010 00:47:00 -0500

On 09/23/2010 12:17 AM, michaelc@xxxxxxxxxxx wrote:
From: Mike Christie<michaelc@xxxxxxxxxxx>

If a lld does:

         ret = fc_block_scsi_eh(cmnd);
         if (ret)
                 return ret;

in the eh callbacks, then it could cause the following race:

1 the LLD will call fc_block_scsi_eh from the scsi eh thread.
2 From the FC class thread, the fast io fail tmo will fire and set
FC_RPORT_FAST_FAIL_TIMEDOUT, then begin to call terminate_rport_io.
3 The scsi eh thread and the LLD will then break from the
fc_block_scsi_eh block and will return FAST_IO_FAIL.
4 The scsi eh will then assume it owns the command and will start to
process it. It will call scsi_eh_flush_done_q which might fail it or
retry it.
5 But then in the FC class thread, the LLD terminate_rport_io callback
could be processing the IO and possibly accessing a scsi_cmnd struct
that the scsi eh thread has now started to retry or failed and
reallocated to a new request in #4.

This patch has fc_block_scsi_eh wait until the terminate_rport_io
callback has completed before returning. This allows LLDs to not
have to worry about the race.

I think this is not going to work. It looks like for drivers like lpfc 
and even qla2xxx in the ISP_ABORT case, because even after 
terminate_rport_io has completed the driver can still touch the scsi_cmnd.
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html