On Thu, Apr 16, 2009 at 07:53:59AM -0400, Kyle Brandt wrote: > > On several of my servers I seem to have a high rate of server crashes do to > file system errors. So I have some questions related to this: > > Is there any Mean Time Between Failure ( MTBF) data for the ext3 > file-system? > > Does increased partition size cause a higher risk of the partition being > corrupted? If so, is there any data on the ratio between partition size and > the likely hood of failure? The probability of these sorts of filesystem problems is going to be dominated by hardware induced corruptions --- so it's not going to make a lot of sense to talk about MTBF failures without having a specific hardware context in mind. If you have lousy memory, or a lousy disk controller cable, or a cable connector which is loose then corruptions will happen often. If you are are located some place where there is a strong alpha particle source, then you will have a much greater percentage chance of bit flips. If you use ECC memory, and do very careful hardware selection, with enterprise-quality disks that trade off disk capacity for a much stronger level of ECC codes, then of course the MBTF will be much less. (For example, there was the imfamous story in the early 1990's when Sun had a spate of bad memory; I think it was ultimately traced to radioactive contamination of the ceramic materials used to make their memory chips; this caused alpha particles to cause "bit flips" and which had the result of making their customers rather antsy, especially since Sun tried todeny there was even a problem for quite some time.) So if you are having a high rate of server crashes, the first thing I would do is to make sure you have the latest distribution updates; it's possible it's caused by a kernel bug that has since been fixed, but it's somewhat unlikely. The next thing I would do is take one of the machines that has been cashing off line, and try running a 36-48 hour memory test. > Does ext3 on hardware raid (10) increase the possibility of file system > corruption? No, it shouldn't --- unless you have a buggy or otherwise dodgy hardware raid controller. - Ted _______________________________________________ Ext3-users mailing list Ext3-users@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/ext3-users