Re: [PATCH v2][RFC] scsi_transport_fc: Implement I_T nexus reset

Mike Christie <michaelc@xxxxxxxxxxx> · Thu, 07 Mar 2013 13:19:39 -0600

Sorry for the late reply.

On 12/11/2012 02:23 AM, Hannes Reinecke wrote:
> @@ -793,7 +793,8 @@ struct scsi_host_template bfad_im_scsi_host_template = {
>  	.queuecommand = bfad_im_queuecommand,
>  	.eh_abort_handler = bfad_im_abort_handler,
>  	.eh_device_reset_handler = bfad_im_reset_lun_handler,
> -	.eh_bus_reset_handler = bfad_im_reset_bus_handler,
> +	.eh_target_reset_handler = fc_eh_it_nexus_loss_handler,
> +	.eh_bus_reset_handler = NULL,

Don't need to set to NULL in the final patch, and don't forget to send a
patch to remove all the code we do not need anymore :)

> +fc_eh_it_nexus_loss_handler(struct scsi_cmnd *cmnd)
> +{
> +	struct fc_internal *i = to_fc_internal(cmnd->device->host->transportt);
> +	struct scsi_target *starget = scsi_target(cmnd->device);
> +	struct fc_rport *rport = starget_to_rport(starget);
> +	int ret;
> +
> +	ret = fc_block_scsi_eh(cmnd);
> +	if (i->f->eh_it_nexus_loss)
> +		ret = i->f->eh_it_nexus_loss(cmnd);
> +
> +	/* FAST_IO_FAIL indicates the port is already blocked */
> +	if (ret == FAST_IO_FAIL)
> +		return ret;
> +	if (ret == SUCCESS)
> +		/* All outstanding I/O has been aborted */
> +		__fc_remote_port_delete(rport, -1);
> +	else {
> +		/* Failed to abort outstanding I/O, trigger FAST_IO_FAIL */
> +		__fc_remote_port_delete(rport, 0);

I think it looks ok from a high level, but I am not sure how the drivers
are working here.

What happens for lpfc? It seems __fc_remote_port_delete ends up calling
the fast io fail code right away and that sets
FC_RPORT_FAST_FAIL_TIMEDOUT. We will then call lpfc_terminate_rport_io
which only will send aborts for the commands. We will then call
fc_block_scsi_eh above and that returns FAST_IO_FAIL and we will pass
that back up to the scsi eh right away.

But it seems lpfc_terminate_rport_io does not wait for the abort
reposnses and clean up the affected scsi_cmnds, and it does not seem to
do something to prevent lpfc from touching affected scsi_cmnds, does it
(I could not find the code)? If lpfc ends up touching a scsi_cmnd after
we have return FAST_IO_FAIL from this function then both lpfc and some
other code could be using the same scsi_cmnd struct.

For qla2xxx, it seems qla2x00_terminate_rport_io aborts commands, but it
looks like there is a small race where if some other thread was actually
completing the command already, then that thread could be touching the
scsi command, but this function could return and the scsi eh could end
up giving the command to some other driver or retrying while the other
thread was still touching it.

It also seems like there is a race where since
qla2x00_terminate_rport_io also calls the logout functions for the port,
then if that path was fast enough it could it lead to
fc_remote_port_delete getting called by qla2xxx while
fc_eh_it_nexus_loss_handler's call to __fc_remote_port_delete was still
running?

> +		ret = fc_block_scsi_eh(cmnd);
> +	}
> +	if (ret != FAST_IO_FAIL) {
> +		if (rport->port_state == FC_PORTSTATE_ONLINE)
> +			ret = SUCCESS;
> +		else
> +			ret = FAILED;
> +	}
> +	return ret;
> +}
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html