Re: Aic94xx and Linux kernel 2.6.19

"Darrick J. Wong" <djwong@xxxxxxxxxx> · Fri, 10 Nov 2006 15:53:20 -0800

[Hm, linux-scsi ought to be cc'd on this...]

mike.redan@xxxxxxx wrote:
>> Here they are:
>> Nov 10 02:08:08 192.168.207.10/192.168.207.10 kernel: sd 0:0:0:0: SCSI
>> error: return code = 0x00070000
>> Nov 10 02:08:08 192.168.207.10/192.168.207.10 kernel: end_request: I/O
>> error, dev sda, sector 77429847 
> 
> Yep, I've seen that now too.  It looks to me like we're getting
> DID_ERROR for some reason.  The only reason for that in the libata code
> seems to deal with bad SCSI commands and/or memory allocation problems,
> but I'll keep digging.

These errors are memory allocation problems in libata.  When I plug a
whole lot of SAS and SATA disks into my x260 and run the pounder stress
test, the amount of buffers on my system increases over a period of
about twenty minutes until libata can no longer allocate ata_queued_cmd
structures.  At this point we start seeing the errors above.  Since we
can't allocate new commands, libsas/aic94xx never even get called, which
is why they are silent on the matter.  However, if I kill pounder before
totally running out of memory, the amount of buffers will decrease very
rapidly and the system is ok.

So, a question to you, Mr. Redan: What does /proc/meminfo look like at
crash time?  If you have a huge amount of buffers, then we're seeing the
same thing.

And a question for everyone else: Because the buffers drain out fairly
quickly after pounder dies, does this mean that the controller is being
subjected to too much I/O at once?

--D
-
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html