Hi all,
I've been asked to look into a SCSI problem. I know my way around the
kernel, but I'm new to SCSI/disk operations, so please bear with me (and
educate me) if my terminology is off.
We have an x86-based blade with dual LSI 53c1030 devices.
We recently moved to a new kernel version, and now some of our userspace
apps are getting QUEUE_FULL/SAM_STAT_TASK_SET_FULL errors when issuing
SCSI requests on the sg device nodes. The requests were generally
related to the health of the disks (ie, LOG_SENSE, REQUEST_SENSE,
TEST_UNIT_READY, MODE_SENSE_10, that sort of thing).
We had been using a vendor-supplied 2.6.10 kernel with version 3.01.18
of the Fusion MPT driver. The new kernel is based on 2.6.14 and uses
version 3.02.57 of the Fusion MPT driver.
So....first, what is the proper way to handle this type of error in
userspace? Should the app immediately retry, or wait a bit then retry?
Should there be a cap on the number of retries? If so, what's a
reasonable limit?
Second, can anyone think of why we would suddenly get more of these
errors when moving between these kernel versions when using the same
hardware?
Third, what do we do to get rid of the errors? Updating the driver may
be an option, updating the kernel itself is not.
Thanks,
Chris
-
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html