Hi Brett!
Thanks for the response, hopefully we can gather enough data points to
help solve the problem.
The new PERC 5/i integrated firmware dated 11/21/2006 is at:
http://support.dell.com/support/downloads/format.aspx?c=us&l=en&s=gen&SystemID=PWE_2950&os=LIN4&osl=en&deviceid=9182&typecnt=2&libid=46&releaseid=R139225&vercnt=3
PERC 5/E adapter:
http://support.dell.com/support/downloads/format.aspx?c=us&l=en&s=gen&SystemID=PWE_2950&os=LIN4&osl=en&deviceid=9181&typecnt=2&libid=46&releaseid=R139227&vercnt=2
The release notes describe very similar symptoms, but I am not ready to
believe it yet as I can't reliably reproduce the problem well enough to
be confident of a fix, though it sounds like you might be able to.
Unfortunately we're using Debian at the moment, but if I can reproduce I
can run on RHEL in a heartbeat to duplicate it for support (for now I'm
trying to minimize variables).
Also, which driver version are you running? I noticed you were using
some patches from Sumant Patro@LSI - is your driver identical to the one
in 2.6.19? If not, what does it look like?
Have you noticed any correlations with patrol reads at the times of the
failures? You can tell by running MegaCli -FwTermLog -Dsply -aALL
What hardware are you running (CPUs, RAM, disk configuration)?
Have you noticed any correlation with heavy network I/O (as well as disk
I/O)? Some of our systems may have experienced this when running more
network load than typical.
Thanks!
Joe
Brett G. Durrett wrote:
I am still seeing this and we have between 2 and 5 failures per week
(across almost 20 machines). I am seeing it on ext3 (we migrated all
of the machines from XFS) and with ReadAhead disabled.
You mention a firmware update but I don't see any new PERC 5 firmware
packages on Dell's site... can you give me a pointer to the firmware
update?
Also, has anybody had this problem on RHE? Dell does not support
Linux unless it is RHE... I would be surprised is somehow RHE did not
have this problem.
B-
Joe Malicki wrote:
I have the same or a similar issue running 2.6.17 SMP x86_64 - the
megaraid_sas driver hangs waiting for commands and then the filesystem
unmounts, leaving the machine in an unusable state until there is a
hard
reboot (the machine is responsive but any access, shell or
otherwise, is
impossible without the filesystem). While I do not have much debugging
information available, this happens to me about once every 6-7 days in
my pool of seven machines, so I can probably get debugging info. Since
the disk is offline and I can't get remote console, I don't have any
details except something similar to Dave Lloyd's post, below.
-
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html