On Wednesday 07 March 2012 11.17.15 m.roth@xxxxxxxxx wrote: > Got a bunch of servers from Penguin. Supermicro m/b's H8QG6. We put a 3tb > drive in for additional workspace for the users, and some of them won't > read, others will go for weeks, then spit out DRDY errors. lshw shows the > controller as an ATI SB7x0/SB8x0/SB9x0 SATA. ... > Now, I've been working on one with Penguin. I noticed one thing, that it > was set to native IDE. After googling, I saw that the most recent spec, > which included EIDE, should be good to petabytes... but I tried resetting > it to AHCI anyway. > > The user ran one job, ok... then another last night, and it's spitting the > same errors. ... > Mar 7 00:53:28 <server> kernel: ata2.00: failed command: WRITE FPDMA QUEUED ... > 40/00:04:20:00:00/00:00:00:00:00/40 Emask 0x4 (timeout) ... > Mar 7 00:53:28 <server> kernel: ata2: hard resetting link While writing the drive timed out and the link to it was then subjected to a hard reset. This is not normal and usually points to bad drive or buggy firmware. Have you had a look at smartdata for the drive(s)? (you may want to run the smart selftests) Also, I'd suggest you test it in a controlled environment. For example, can any of your drives survive a full surface write? (dd if=/dev/zero bs=1M of=..) Full surface read? Do the tests against /dev/sdX to be sure (excludes partitioning, filesystems, volume management, etc.) Do note that writing your drive full of zeros _will_ destroy your data (I really hope that's stating the obvious...). /Peter
Attachment:
signature.asc
Description: This is a digitally signed message part.
_______________________________________________ CentOS mailing list CentOS@xxxxxxxxxx http://lists.centos.org/mailman/listinfo/centos