[RFC] SCSI timeout error recovery issue

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi,

This is the first post to this list, but I appreciate your comments
to solve a problem which I found.


PROBLEM DESCRIPTION
===================

I'm evaluating the behavior of systems with broken devices which
generate scsi timeout errors. During this test, I encountered the
problem that a userspace test program took unexpectedly long time
for its execution.

In the scsi-mid layer, A state of a device is changed from "running"
to "offline" when an error was detected and failed its recovery.
However, in this test, states of error devices don't stay in
"offline" but get back to "running", and I/Os are issued to error
devices several times, and scsi timeout errors happened many times.
This is the reason why the test program takes long time.

When the SCSI mid-level layer detects a timeout for an I/O to a device
and failed its recovery, the SCSI mid-level layer changes the device's
state from "running" to "offline". And then, SCSI mid-level layer
detects timeout on an another device and failed its recovery, the
state of the device is changed from "running" to "offline", but
at the same time, the first device in a "offline" state goes back to
"running" state.

This means that a timeout error happens on several devices, previous
errors are forgotten and device states are changed to "running".
Therefore, a test application can access no responding devices several
times and timeout errors happen each time.

Here is an example of the sequence with two no responding devices,
A and B. When they are alternately accessed by a test program, the
following sequence happens.

  0. Original states of devices
       device A: "running", device B: "running"

  1. The device A is accessed.
  2. A scsi timeout happens on the device A and its recovery fails.
  3. A state of the device A is changed to "offline".
       device A: "offline", device B: "running"
  4. The device B is accessed.
  5. A scsi timeout happens on the device B and its recovery fails.
  6. The state of devices is changed as follows.
       device A: "running", device B: "offline"

  7. (go back to step 1 and continue this iteration)


BRIEF SOURCE CODE ANALYSIS
==========================

During the recovery procedure for scsi timeout error, a qla2xxx driver
calls fc_remote_port_add() and fc_remote_port_delete(). They call
scsi_device_set_state() internally and changes a state of devices.
At the beginning of the recovery procedure, states of all devices are
changed to SDEV_BLOCK state. And then, they go back to SDEV_RUNNING
state at the end of the recovery procedure, and only one device with
the error which triggered the recovery procedure goes to SDEV_OFFLINE
state at last. As a result, the previous states of other devices are
forgotten.


TEST ENVIRONMENT
================

Kernel version is 2.6.29, and disk devices are connected through
Fibre Channel using a qla2xxx driver. A scsi timeout is generated by
a self-made scsi timeout injection module, which wraps the scsi
queuecommand handler and ignores I/Os to the specified device and
any I/Os are not passed to the qla2xxx driver.

As for details, please see the source code listed in Appendix B.


REQUEST FOR COMMENTS
====================

- My expectation is scsi mid layer disables failed devices and keeps
  their "offline" states. Is the scsi behavior correct and intended?

- Which component, scsi mid level or qla2xxx driver, seems wrongly
  working and needs to be fixed? Otherwise is the method to inject scsi
  timeout wrong?

- How can we keep no responding devices in the state of "offline" to
  prevent scsi timeout errors from happening again and again on the same
  devices? I'm afraid that deleting device files such as /dev/sdx is
  not a solution when an application accesses devices through device-mapper.


APPENDIX A. REPRODUCTION STEPS
==============================

0. Environment
    kernel ... 2.6.29
    scsi LLD ... qla2xxx
    devices ... /dev/sdc (2:0:0:0), /dev/sdd (2:0:0:1)
    scsi timeout ... 3 seconds.

1. Getting an address of scsi_host_template for LLD

  Getting an address of scsi_host_template table specific to LLD.
  In case of qla2xxx driver, a table name is "qla2x00_driver_template".

    # grep qla2x00_driver_template /proc/kallsyms
    f8a323c0 d qla2x00_driver_template      [qla2xxx]

2. Building and loading the scsi timeout injection module

  Loading the scsi timeout injection module with a "param" option,
  which is a series of three parameters, scsi_driver_template address
  got in step 1 and two scsi device targets on which a timeout error
  is injected.

  Here is an example to inject a scsi timeout to scsi devices,
  2:0:0:0, 2:0:0:1.

    # insmod scsi_timeout.ko param=0xf8a323c0,2:0:0:0,2:0:0:1

3. Checking device states

  Both devices, /dev/sd[cd], are now in "running" state.
 
    # cat /sys/block/sdc/device/state
    running
    # cat /sys/block/sdd/device/state
    running

4. Issuing I/Os to the first device (/dev/sdc)

  Issue I/Os to the first device and it takes about 76 seconds.

    # dd if=/dev/sdc of=/dev/null bs=4096 count=100
    dd: reading `/dev/sdc': Input/output error
    0+0 records in
    0+0 records out
    0 bytes (0 B) copied, 75.4365 seconds, 0.0 kB/s

5. Check device states

  The first device (/dev/sdc) is changed to "offline".

    # cat /sys/block/sdc/device/state
    offline
    # cat /sys/block/sdd/device/state
    running

6. Issuing I/Os to the second device (/dev/sdd)

  Issue I/Os to the second device and it takes about 76 seconds.

    # dd if=/dev/sdd of=/dev/null bs=4096 count=100
    dd: reading `/dev/sdd': Input/output error
    0+0 records in
    0+0 records out
    0 bytes (0 B) copied, 75.9649 seconds, 0.0 kB/s

7. Check device states

  The second device (/dev/sdd) is changed to "offline", but the first
  device (/dev/sdc) is changed back to "running".

    # cat /sys/block/sdc/device/state
    running
    # cat /sys/block/sdd/device/state
    offline

8. Again issuing I/Os to the first device (/dev/sdc)

  I/Os to the first device take 76 seconds once again, because the first
  device is in the state of running and I/Os issued by a dd command are
  sent to the device.

    # dd if=/dev/sdc of=/dev/null bs=4096 count=100
    dd: reading `/dev/sdc': Input/output error
    0+0 records in
    0+0 records out
    0 bytes (0 B) copied, 75.8986 seconds, 0.0 kB/s


APPENDIX B. SCSI TIMEOUT INJECTION MODULE
=========================================

/*
 * scsi timeout injection module
 */
#include <linux/module.h>
#include <scsi/scsi_cmnd.h>
#include <scsi/scsi_host.h>
#include <scsi/scsi_device.h>

static struct scsi_host_template *sht;
static char config[32];

static struct target {
	short host;
	uint channel;
	uint id;
	uint lun;
} st[2];

static int (*org_qc)(struct scsi_cmnd *,
		     void (*done)(struct scsi_cmnd *));


static inline int check_dev(struct target *st, struct scsi_cmnd *cmd)
{
	return (st->host == cmd->device->host->host_no &&
		st->channel == cmd->device->channel &&
		st->id == cmd->device->id &&
		st->lun == cmd->device->lun);
}

static int dbg_qc(struct scsi_cmnd *cmd,
		  void (*done)(struct scsi_cmnd *))
{
	int ret = 0;

	preempt_disable();
	if (check_dev(&st[0], cmd) || check_dev(&st[1], cmd))
		goto done;
	ret = org_qc(cmd, done);
done:
	preempt_enable();
	return ret;
}

static int __init scsi_timeout_module_init(void)
{
	int ret;

	ret = sscanf(config, "%lx,%hd:%d:%d:%d,%hd:%d:%d:%d",
		     (ulong *)&sht,
		     &st[0].host, &st[0].channel, &st[0].id, &st[0].lun,
		     &st[1].host, &st[1].channel, &st[1].id, &st[1].lun);
	if (ret != 9) {
		printk(KERN_INFO "scsi_timeout_module: invalid options\n");
		return -1;
	}

	org_qc = sht->queuecommand;
	sht->queuecommand = dbg_qc;

	printk(KERN_INFO
	       "scsi timeout injection: %hd:%d:%d:%d %hd:%d:%d:%d\n",
	       st[0].host, st[0].channel, st[0].id, st[0].lun,
	       st[1].host, st[1].channel, st[1].id, st[1].lun);

	return 0;
}

static void __exit scsi_timeout_module_exit(void)
{
	sht->queuecommand = org_qc;
	synchronize_sched();
}

module_init(scsi_timeout_module_init);
module_exit(scsi_timeout_module_exit);
module_param_string(param, config, 32, 0);

MODULE_AUTHOR("Takahiro Yasui <tyasui@xxxxxxxxxx>");
MODULE_LICENSE("GPL");


Again, I appreciate your kind comments.

Regards,
---
Takahiro Yasui
Hitachi Computer Products (America), Inc.


--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Index of Archives]     [SCSI Target Devel]     [Linux SCSI Target Infrastructure]     [Kernel Newbies]     [IDE]     [Security]     [Git]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux ATA RAID]     [Linux IIO]     [Samba]     [Device Mapper]
  Powered by Linux