On 27 Oct 2012, Theodore Ts'o said: > On Sat, Oct 27, 2012 at 01:45:25PM +0100, Nix wrote: >> Ah! it's turned on by journal_async_commit. OK, that alone argues >> against use of journal_async_commit, tested or not, and I'd not have >> turned it on if I'd noticed that. >> >> (So, the combinations I'll be trying for effect on this bug are: >> >> journal_async_commit (as now) >> journal_checksum >> none > > Can you also check and see whether the presence or absence of > "nobarrier" makes a difference? Done. (Also checked the effect of your patches posted earlier this week: no effect, I'm afraid, certainly not under the fails-even-on-3.6.1 test I was carrying out, umount -l'ing /var as the very last thing I did before /sbin/reboot -f.) nobarrier makes a difference that I, at least, did not expect: [no options] No corruption nobarrier No corruption journal_checksum Corruption Corrupted transaction, journal aborted nobarrier,journal_checksum Corruption Corrupted transaction, journal aborted journal_async_commit Corruption Corrupted transaction, journal aborted nobarrier,journal_async_commit Corruption No corrupted transaction or aborted journal I didn't expect the last case at all, and it adequately explains why you are mostly seeing corrupted journal messages in your tests but I was not. It also explains why when I saw this for the first time I was able to mount the resulting corrupted filesystem read-write and corrupt it further before I noticed that anything was wrong. It is also clear that journal_checksum and all that relies on it is worse than useless right now, as Eric reported while I was testing this. It should probably be marked CONFIG_BROKEN in future 3.[346].* stable kernels, if CONFIG_BROKEN existed anymore, which it doesn't. It's a shame journal_async_commit depends on a broken feature: it might be notionally unsafe but on some of my systems (without nobarrier or flashy caching controllers) it was associated with a noticeable speedup of metadata-heavy workloads -- though that was way back in 2009... however, "safety first" definitely applies in this case. -- NULL && (void) -- To unsubscribe from this list: send the line "unsubscribe linux-ext4" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html