On Friday 16 April 2010, Andre Noll wrote: > On 09:36, Andrew Vasquez wrote: > > > > qla2xxx 0000:06:09.0: scsi(0:0:0): Abort command issued -- 1 fa6a73 > > > > 2002. > > > > > > > > I can't explain why the storage did not complete the request in the > > > > allotted time. > > > > > > Ah, that's valuable information, thanks. The underlying Infortrend > > > Raid System is rather old but worked without any problems for several > > > years. We recently replaced its 400G disks by new 2T WD disks. Maybe > > > the new disks have longer response times, could that be the reason? And > > > is there a way to increase the timeout value? > > > > To update the default timeout value (30 seconds) for commands > > submitted to /dev/sdn to 60 seconds: > > > > $ echo 60 > /sys/block/sdn/device/timeout > > I will re-run the stress test with a 60 seconds timeout value and follow > up if this did not help. That will not help if the command is "SYNCHRONIZE_CACHE", as that ignores device settings, but uses scsi default timeout (30s), which is far too small for SATA based raid units. Scsi maintainers ignored that and a couple of other patches I wrote to improve error handling with Infortrend units. Will send the patches again soon. Also, if the abort command succeeds, it the command should be re-queued and should not result in an error. I think my patches also would increase verbosity to point out what exactly happened (possibly a wrong return code in the qla2xxx driver, although that should activate the next step in error handling, I need to find some to go through the code...). Altogether filesystem unrelated. The filesystem just might be the reason for a synchronize-cache, e.g. barriers, etc. Greetings from Tübingen, Bernd -- To unsubscribe from this list: send the line "unsubscribe linux-ext4" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html