Problem handling task management functions in qla2xxx

Vladislav Bolkhovitin <vst@xxxxxxxx> · Tue, 22 Aug 2006 18:25:28 +0400

Hello,

If a task management function is issued, eg using sg_reset utility (the 
easiest way), during active IO to qla2xxx device (ISP2422), it often 
fails with messages like:

------------------------------------------------------------------

qla2xxx 0000:04:02.0: scsi(13:0:1): DEVICE RESET ISSUED.
qla2xxx 0000:04:02.0: qla2xxx_eh_device_reset: failed while waiting for
commands

------------------------------------------------------------------

This could lead to broken SCSI mid-level's error recovery and 
erroneously making the device(es) offline, when they are actually healthy.

I did some investigations and figured out that the driver waits some 
time for the firmware to finish aborting the outstanding commands with 
CS_ABORTED status and if at least one command isn't finished until 
timeout, FAILED is returned.

The problem is how the wait is implemented. Here is the code:

------------------------------------------------------------------

static int
qla2x00_eh_wait_on_command(scsi_qla_host_t *ha, struct scsi_cmnd *cmd)
{
#define ABORT_POLLING_PERIOD    1000
#define ABORT_WAIT_ITER         ((10 * 1000) / (ABORT_POLLING_PERIOD))
        unsigned long wait_iter = ABORT_WAIT_ITER;
        int ret = QLA_SUCCESS;

        while (CMD_SP(cmd)) {
                msleep(ABORT_POLLING_PERIOD);

                if (--wait_iter)
                        break;
        }
        if (CMD_SP(cmd))
                ret = QLA_FUNCTION_FAILED;

        return ret;
}

------------------------------------------------------------------

Where CMD_SP() is defined as
#define CMD_SP(Cmnd)            ((Cmnd)->SCp.ptr)

It's set to NULL just before cmd->scsi_done() is called.

You can see that this way of waiting has a race with the SCSI mid-level, 
where it can free and reuse the command while 
qla2x00_eh_wait_on_command() is sleeping in msleep(), so SCp.ptr can 
become non-NULL again, which could lead to the above false errors.

Regards,
Vlad

-
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html