getting QUEUE_FULL/SAM_STAT_TASK_SET_FULL from userspace sg operations

"Chris Friesen" <cfriesen@xxxxxxxxxx> · Fri, 09 Nov 2007 15:59:25 -0600

Hi all,

I've been asked to look into a SCSI problem.  I know my way around the 
kernel, but I'm new to SCSI/disk operations, so please bear with me (and 
educate me) if my terminology is off.

We have an x86-based blade with dual LSI 53c1030 devices.

We recently moved to a new kernel version, and now some of our userspace 
apps are getting QUEUE_FULL/SAM_STAT_TASK_SET_FULL errors when issuing 
SCSI requests on the sg device nodes.  The requests were generally 
related to the health of the disks (ie, LOG_SENSE, REQUEST_SENSE, 
TEST_UNIT_READY, MODE_SENSE_10, that sort of thing).

We had been using a vendor-supplied 2.6.10 kernel with version 3.01.18 
of the Fusion MPT driver.  The new kernel is based on 2.6.14 and uses 
version 3.02.57 of the Fusion MPT driver.

So....first, what is the proper way to handle this type of error in 
userspace?  Should the app immediately retry, or wait a bit then retry? 
 Should there be a cap on the number of retries?  If so, what's a 
reasonable limit?

Second, can anyone think of why we would suddenly get more of these 
errors when moving between these kernel versions when using the same 
hardware?

Third, what do we do to get rid of the errors?  Updating the driver may 
be an option, updating the kernel itself is not.

Thanks,

Chris
-
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html