On Mon, 20 Apr 2009, Theodore Tso wrote:
On Mon, Apr 20, 2009 at 10:33:09AM +0100, Jeremy Sanders wrote:
However, the system seems to mostly work, so I recreated the ext4 device,
I've just run my backup script again and fsck'd the device. It seems the
problem is reproducible with the new kernel:
When you say reproducible, how many times have you tried it, and were
you able to reproduce it every single time? 50% of time? I do
believe there is a problem, but we haven't been able to something
where it's easily reproducible. So if you can easily reproduce this,
this is definitely very exciting.
It takes a day or two to do the sync. I've only done it twice (one with
the old kernel, once with the new fedora testing kernel) and it happened
both times. I'm afraid the statistics are rather low number here.
I did a different faster test (just copying my home directory lots of
times), but I wasn't able to get it to fail. That test didn't use much
disk space, however. Maybe it's worth just dd'ing a few TB of data onto
the device and seeing whether that fails.
[root@xback2 ~]# fsck /dev/md0
fsck 1.41.4 (27-Jan-2009)
e2fsck 1.41.4 (27-Jan-2009)
fsck.ext4: Group descriptors look bad... trying backup blocks...
Group descriptor 0 checksum is invalid. Fix<y>?
Do you have to reboot to see this, or is it enough to unmount the
filesystem? How big is the ext4 filesystem, and how big was the
amount of data that you rsync'ed? One thing that would be worth
trying if you can easily reproduce is whether it happens on a single
device disk, or whether it only shows up when you use a /dev/mdX
device.
I didn't reboot this time - I did last time. I just unmounted the file
system and fsckd it. The filesystem is 8.2TB and the data is around 2.5TB.
The drives on a 3ware card, so I could configure the card as a single
raid5 device and try to reproduce it there. It may take a day or two to
copy the data if I try this.
Jeremy
--
Jeremy Sanders <jss@xxxxxxxxxxxxx> http://www-xray.ast.cam.ac.uk/~jss/
X-Ray Group, Institute of Astronomy, University of Cambridge, UK.
Public Key Server PGP Key ID: E1AAE053
--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html