Problem handling task management functions in qla2xxx

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hello,

If a task management function is issued, eg using sg_reset utility (the easiest way), during active IO to qla2xxx device (ISP2422), it often fails with messages like:

------------------------------------------------------------------

qla2xxx 0000:04:02.0: scsi(13:0:1): DEVICE RESET ISSUED.
qla2xxx 0000:04:02.0: qla2xxx_eh_device_reset: failed while waiting for
commands

------------------------------------------------------------------

This could lead to broken SCSI mid-level's error recovery and erroneously making the device(es) offline, when they are actually healthy.

I did some investigations and figured out that the driver waits some time for the firmware to finish aborting the outstanding commands with CS_ABORTED status and if at least one command isn't finished until timeout, FAILED is returned.

The problem is how the wait is implemented. Here is the code:

------------------------------------------------------------------

static int
qla2x00_eh_wait_on_command(scsi_qla_host_t *ha, struct scsi_cmnd *cmd)
{
#define ABORT_POLLING_PERIOD    1000
#define ABORT_WAIT_ITER         ((10 * 1000) / (ABORT_POLLING_PERIOD))
        unsigned long wait_iter = ABORT_WAIT_ITER;
        int ret = QLA_SUCCESS;

        while (CMD_SP(cmd)) {
                msleep(ABORT_POLLING_PERIOD);

                if (--wait_iter)
                        break;
        }
        if (CMD_SP(cmd))
                ret = QLA_FUNCTION_FAILED;

        return ret;
}

------------------------------------------------------------------

Where CMD_SP() is defined as
#define CMD_SP(Cmnd)            ((Cmnd)->SCp.ptr)

It's set to NULL just before cmd->scsi_done() is called.

You can see that this way of waiting has a race with the SCSI mid-level, where it can free and reuse the command while qla2x00_eh_wait_on_command() is sleeping in msleep(), so SCp.ptr can become non-NULL again, which could lead to the above false errors.

Regards,
Vlad

-
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Index of Archives]     [SCSI Target Devel]     [Linux SCSI Target Infrastructure]     [Kernel Newbies]     [IDE]     [Security]     [Git]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux ATA RAID]     [Linux IIO]     [Samba]     [Device Mapper]
  Powered by Linux