What I said was that timeouts occurring due to transmission errors
should be recoverable. It seems like IRQ delivery didn't work probably
due to screaming IRQ. I need to see the messages before the first
relevant error message. It's always a good idea to post full kernel log
from boot till failure. Things which don't seem relevant are often
relevant.
Naturally. Full kern.log with boot:
http://www.huweb.hu/maques/tmp/jmicron/kern.log
(no edits, there are really only those 2 lines between Feb 6 and Feb 9's 1st
exception)
Previously there was kernel 2.6.23.9 and I noticed the following in syslog
by then:
Feb 6 19:10:19 storage1 kernel: ata4: D2H reg with I during NCQ, this
message won't be printed again
Feb 6 19:10:20 storage1 kernel: ata1: D2H reg with I during NCQ, this
message won't be printed again
Feb 6 19:10:20 storage1 kernel: ata2: D2H reg with I during NCQ, this
message won't be printed again
Feb 6 19:10:21 storage1 kernel: ata3: D2H reg with I during NCQ, this
message won't be printed again
I googled and saw that there was some fixes related to this (maybe it
was you), so that's why we hoped that 2.6.24 will fix this. Actually the
above error messages were gone, but...
Till now, none of this kind of problem has been tracked down to MB or
the controller while 90% of hardware problems turned out to be power
related.
I'll put a brand new, probably different PSU in the case and put the MB
and the 4 disks of the problematic controller on it, and put the 2 system
and other 4 disks to this one (or even another one).
Meanwhile I'd welcome if you have any suggestion why controller reset
causing a "fatal error"...
BTW, the drives were accessible after the array broke (when I got there).
Thanks,
Gabor
-
To unsubscribe from this list: send the line "unsubscribe linux-ide" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html