pagecache corruption on Tyan S3870

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]



A couple of months ago I reported some problems with a batch of Tyan K8SSA (S3870) based machines. We are continuing to have an odd problem with these boxes, and if anyone has seen something similar elsewhere, I'd appreciate hearing about it.

These boxes are running Centos 4.4 x86_64 with kernel 2.6.9-42.0.3.ELsmp. They are dual Opteron 265's (dual core) with 4x2GB DIMM's. The DIMMs used to be mixed sizes, but Tyan recommended making them all the same, and the vendor made the substitutions. We have also clocked the memory down from 400 MHz to 266 MHz, also on the advice of Tyan.

The symptom is that some large (700MB to >1GB) files opened for read and then closed show corruption in the pagecache. One or more 4k blocks in a file will be completely trashed. It's as if a random page of other data is substituted. A reboot or a flush of the pagecache fixes the problem, so it's only in the pagecache, not on disk. We are doing regular MD5 checksums of the files, which shows up the problem, in addition to having our application crash from time to time.

We have some older Tyan motherboards that don't show this problem. At this point it seems it is either a hardware problem or a kernel motherboard-support problem, but it's pretty baffling.

Thanks,
Dan
_______________________________________________
CentOS mailing list
CentOS@xxxxxxxxxx
http://lists.centos.org/mailman/listinfo/centos

[Index of Archives]     [CentOS]     [CentOS Announce]     [CentOS Development]     [CentOS ARM Devel]     [CentOS Docs]     [CentOS Virtualization]     [Carrier Grade Linux]     [Linux Media]     [Asterisk]     [DCCP]     [Netdev]     [Xorg]     [Linux USB]
  Powered by Linux