On Jul 07, 2009 18:16 +0000, Evan King wrote: > So my questions are these: > > - How likely is it that some arcane bug in ext4 is responsible for the failure? It is possible - there are still bugs being fixed in ext4. > - What can I do to track the occurrence of this bug, its source, and/or the > conditions that may trigger it? (Note that iostat shows nothing of interest, > as the actual I/O load isn't particularly unusual.) Reporting the actual kernel version you are using is critical. If you are going to stick with ext4, I would follow the latest FC11 kernels, since there is an active maintainer for ext4 at Red Hat. Depending on how you formatted the filesystem, you may be able to revert to ext3 if you want more stability. Providing the output of "dumpe2fs -h" for the filesystem will tell (in particular the "features" line). > - Should I seriously consider using an SSD? (NFS will not share > memory-mapped directories, which thwarted the last of my 'better' plans, > and the software's scratch directory can potentially grow to several gigs > over the span of a few days/weeks.) That is an independent question from using ext4. If you are using NFS without "async", then an SSD will almost certainly help performance, but it is probably completely unrelated to the corruption issue. Cheers, Andreas -- Andreas Dilger Sr. Staff Engineer, Lustre Group Sun Microsystems of Canada, Inc. -- To unsubscribe from this list: send the line "unsubscribe linux-ext4" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html