replying at the end On Apr 4, 2012, at 5:26 AM, Crunch wrote: > On 04/03/2012 05:58 PM, Tony Schreiner wrote: >> Two weeks ago I (clean-)installed CentOS 6.2 on a server which had been running 5.7. >> >> There is a 16 disk = ~11 TB data volume running on an Areca ARC-1280 raid card with LVM + xfs filesystem on it. The included arcmsr driver module is loaded. >> >> At first it seemed ok, but with in a few hours I started getting I/O error message on directory listings, and then a bit later when I did a vgdisplay command there was garbage in that. > > The file system data are being corrupted. This can only happen either > through human intervention or hardware failure; assuming that the > original installation was okay. This is a safe assumption to make > considering you've reinstalled and it now seems to be okay. > >> >> I then ran the volume check on the RAID card bios, it flagged 3 errors. When I restarted the system, things were ok, but then the problem reappeared. >> I ran another volume check and no errors were flagged (I should note, the check takes about 9 hours). but upon restarting, the file system was ok, but then went bad again. > > Presumably the card bios runs checks only on the firmware and/or the > hardware; say disks and the card itself. The reported errors therefore > point to those components. > >> >> Another symptom was that the cli64 raid management utility, which I got from the Areca site would just hang. > I would guess the utility is a piece of client code that queries the > firmware. Assuming nothing is wrong with the client code, this implies > some form of defect occurring in the firmware. Could be unresponsive > hardware or corrupt firmware code. > >> >> After a couple of days of this, I decided I could not afford to have this system unavailable, and I reinstalled CentOS 5.8. Everything has been fine since. > The firmware and file system may well have corrected the errors on your > first pass. But then for the corruption to happen again without any > detected errors sounds inconsistent. There's something missing here. > Maybe the card corrected the errors itself the second time leaving > corruption behind. > > _______________________________________________ > CentOS mailing list > CentOS@xxxxxxxxxx > http://lists.centos.org/mailman/listinfo/centos i'm not sure what you're saying can be entirely true. What I failed to mention in the original post, is that I did not recreate the problematic data volume during either install; it was preserved both for the upgrade and the downgrade. It doesn't appear that there is any filesystem corruption independent of the raid software, xfs_check doesn't discover any. I'm willing to believe that the raid firmware is problematic, but it seems to be an issue with version 6 but not version 5. I'm in the process of reporting to the BugTracker. As KB mentioned, there is an existing id 5517. Tony Schreiner _______________________________________________ CentOS mailing list CentOS@xxxxxxxxxx http://lists.centos.org/mailman/listinfo/centos