Theodore Tso wrote: > On Mon, Apr 20, 2009 at 12:43:37PM +0100, Jeremy Sanders wrote: >> It takes a day or two to do the sync. I've only done it twice (one with >> the old kernel, once with the new fedora testing kernel) and it happened >> both times. I'm afraid the statistics are rather low number here. >> >> I did a different faster test (just copying my home directory lots of >> times), but I wasn't able to get it to fail. That test didn't use much >> disk space, however. Maybe it's worth just dd'ing a few TB of data onto >> the device and seeing whether that fails. >> >> I didn't reboot this time - I did last time. I just unmounted the file >> system and fsckd it. The filesystem is 8.2TB and the data is around >> 2.5TB. I think trying a filesystem with just under 8T would be a useful test too. > That's that's useful data. I wish we could make it fail more quickly > on a smaller rsync, but the fact that you didn't need to reboot is > definitely useful information. > > And this is a fresh rsync so no files were being deleted, rsync should > have just been writing new files to .filename.XXXXX and then renaming > the filename to filename.XXXXX when it is done, right? > > OK, let me think about this a little. I think we can create a patch > which checks for writes to the block group descriptors and dumps a > stack trace. That would allow us catch the failing code in question > in the act, and maybe figure out what is going on. XFS has block-zero tests, because there was once a bug where uninitialized block numbers in buffers were clobbering the superblock at block 0. It was helpful, so I think this is a good idea, Ted. -Eric -- To unsubscribe from this list: send the line "unsubscribe linux-ext4" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html