On Tue, May 17, 2005 at 11:39:08PM -0600, Frank L. Setinsek wrote:
May 17 21:53:52 compute-0-2.local kernel: mptscsih: ioc0: WARNING - Device
(0:0:1) reported QUEUE_FULL!
May 17 21:53:52 compute-0-2.local kernel: SCSI disk error : host 0 channel 0
id 0 lun 1 return code = 440b0000
I would suspect this is an issue with tagged queueing.
Tagged queueing lets a host tag each I/O request with an identifier so the I/O subsystem can answer the requests in a different order. The host queries the device to find out how large the queue can be. If you have several hosts, all assuming they have the whole queue to themselves they could easily fill it...
Read the documentation for your device, and see what the tagged queue depth is. See if it can be configured. Then find out how you can set the queue depth in your scsi driver. Some drivers can set for each target in some config file. Set max queue depth for the device in the scsi driver on each node to 1/6 of the total queue depth on the device (since you have a 6 node cluster).
Of course the easy test would be to disable tagged queueing completely, but the performance hit can be bad. It would quickly show if the problem goes away...
Remember that you will have to reconfigure the queue depth on all nodes before you can add a new node... So you may want to set the depth to 1/7 of the total so there is room for one more if these nodes run something you cannot restart often.
-- birger
-- Linux-cluster@xxxxxxxxxx http://www.redhat.com/mailman/listinfo/linux-cluster