Re: ext4 corruption during unexpected power cycle in the middle of writing

Eric Sandeen <sandeen@xxxxxxxxxx> · Wed, 06 Jun 2012 00:31:34 -0500

On 6/6/12 12:24 AM, Ming Lei wrote:
> I ran the power cycle test during the middle of file writing and after bootup, I ran force fsck and found two errors (If I run fsck -p -v I don't see the errors). From what I saw I think it is file system meta data corruption. Fsck can repair it but each time I ran the same test and I hit the same issue. 
> 
> I don't think it is relevant but want to point out that sda6 shares the same drive as another partition on sda(sda3) is used for the raid6 array for /var.
> 
> The same issue was found whenever barrier is on or off, and the disk drive write cache is enabled or disabled. The test result shown below is when barrier is on and disk write cache is disabled. 
> 
> I use kernel version 2.6.32SL6 version. I also see the same issue on 2.6.9 based kernel on the same hardware with ext3 file system.
> 
> My question is:
> 1) Is the issue caused from something unique in my box? Configuration error?
> 2) Is it possible my version of fsck reported false errors?

Sort of.  You got:

> Free blocks count wrong (118366120, counted=76269471).
> Fix? yes
> 
> Free inodes count wrong (30081013, counted=30081004).
> Fix? yes

Those are the superblock counters, which aren't journaled - only the bg counters are logged via the journal, IIRC.

They aren't false... they are just expected due to the design I'm afraid.

If you had mounted/unmounted/fsck'd you wouldn't have seen errors, because at mount time the superblock gets updated from all of the individual bg counters in ext4_fill_super:

        /*
         * The journal may have updated the bg summary counts, so we
         * need to update the global counters.
         */

> 3) Is this a known issue? ? Is it a kernel bug?

yes.  Not really.  ;)

> 4) How do I find what's wrong?

I think this is by design, though maybe a little unfortunate in that it is unexpected to get fsck errors on a journaling filesystem after a crash...

I ran into this same thing when doing recovery testing for > 16T filesystems.

-Eric

--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html