Re: Failure growing xfs with linux 3.10.5

Dave Chinner <david@xxxxxxxxxxxxx> · Wed, 14 Aug 2013 16:20:41 +1000

On Tue, Aug 13, 2013 at 05:30:58PM +0200, Michael Maier wrote:
> Dave Chinner wrote:
> > [ re-ccing the list, because finding this is in everyone's interest ]
> > 
> > On Mon, Aug 12, 2013 at 06:25:16PM +0200, Michael Maier wrote:
> >> Eric Sandeen wrote:
> >>> On 8/11/13 2:11 AM, Michael Maier wrote:
> >>>> Hello!
> >>>>
> >>>> I think I'm facing the same problem as already described here:
> >>>> http://thread.gmane.org/gmane.comp.file-systems.xfs.general/54428
> >>>
> >>> Maybe you can try the tracing Dave suggested in that thread?
> >>> It certainly does look similar.
> >>
> >> I attached a trace report while executing xfs_growfs /mnt on linux 3.10.5 (does not happen with 3.9.8).
> >>
> >> xfs_growfs /mnt
> >> meta-data=/dev/mapper/backupMy-daten3 isize=256    agcount=42, agsize=7700480 blks
> >>          =                       sectsz=512   attr=2
> >> data     =                       bsize=4096   blocks=319815680, imaxpct=25
> >>          =                       sunit=0      swidth=0 blks
> >> naming   =version 2              bsize=4096   ascii-ci=0
> >> log      =internal               bsize=4096   blocks=60160, version=2
> >>          =                       sectsz=512   sunit=0 blks, lazy-count=1
> >> realtime =none                   extsz=4096   blocks=0, rtextents=0
> >> xfs_growfs: XFS_IOC_FSGROWFSDATA xfsctl failed: Structure needs cleaning
> >> data blocks changed from 319815680 to 346030080
> >>
> >> The entry in messages was:
> >>
> >> Aug 12 18:09:50 dualc kernel: [  257.368030] ffff8801e8dbd400: 58 46 53 42 00 00 10 00 00 00 00 00 13 10 00 00  XFSB............
> >> Aug 12 18:09:50 dualc kernel: [  257.368037] ffff8801e8dbd410: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
> >> Aug 12 18:09:50 dualc kernel: [  257.368042] ffff8801e8dbd420: 46 91 c6 80 a9 a9 4d 8c 8f e2 18 fd e8 7f 66 e1  F.....M.......f.
> >> Aug 12 18:09:50 dualc kernel: [  257.368045] ffff8801e8dbd430: 00 00 00 00 04 00 00 04 00 00 00 00 00 00 00 80  ................
> >> Aug 12 18:09:50 dualc kernel: [  257.368051] XFS (dm-33): Internal error xfs_sb_read_verify at line 730 of file
> >> /daten2/tmp/rpm/BUILD/kernel-desktop-3.10.5/linux-3.10/fs/xfs/xfs_mount.c.  Caller 0xffffffffa099a2fd
> > .....
> >> Aug 12 18:09:50 dualc kernel: [  257.368533] XFS (dm-33): Corruption detected. Unmount and run xfs_repair
> >> Aug 12 18:09:50 dualc kernel: [  257.368611] XFS (dm-33): metadata I/O error: block 0x3ac00000 ("xfs_trans_read_buf_map") error 117 numblks 1
> >> Aug 12 18:09:50 dualc kernel: [  257.368623] XFS (dm-33): error 117 reading secondary superblock for ag 16
> > 
> > Ok, so that's reading the secondary superblock for AG 16. You're
> > growing the filesystem from 42 to 45 AGs, so this problem is not
> > related to the actual grow operation - it's tripping over a problem
> > that already exists on disk before the grow operation is started.
> > i.e. this is likely to be a real corruption being seen, and it
> > happened some time in the distant past and so we probably won't ever
> > be able to pinpoint the cause of the problem.
> > 
> > That said, let's have a look at the broken superblock. Can you post
> > the output of the commands:
> > 
> > # xfs_db -r -c "sb 16" -c p <dev>
> 
> done after the failed growfs mentioned above:

Looks fine....

> > and
> > 
> > # xfs_db -r -c "sb 16" -c "type data" -c p <dev>
> 
> 000: 58465342 00001000 00000000 13100000 00000000 00000000 00000000 00000000
> 020: 4691c680 a9a94d8c 8fe218fd e87f66e1 00000000 04000004 00000000 00000080
> 040: 00000000 00000081 00000000 00000082 00000001 00758000 0000002a 00000000
> 060: 0000eb00 b4a40200 01000010 00000000 00000000 00000000 0c090804 17000019
> 080: 00000000 00001940 00000000 00000277 00000000 001126ba 00000000 00000000
> 0a0: 00000000 00000000 00000000 00000000 00000000 00000002 00000000 00000000
> 0c0: 00000000 00000001 0000000a 0000000a 8f980320 73987e9e db829704 ef73fe2e
> 0e0: 8f980320 73987e9e db829704 ef73fe2e 8f980320 73987e9e db829704 ef73fe2e
> 100: 8f980320 73987e9e db829704 ef73fe2e 8f980320 73987e9e db829704 ef73fe2e
> 120: 8f980320 73987e9e db829704 ef73fe2e 8f980320 73987e9e db829704 ef73fe2e
> 140: 8f980320 73987e9e db829704 ef73fe2e 8f980320 73987e9e db829704 ef73fe2e
> 160: 8f980320 73987e9e db829704 ef73fe2e 8f980320 73987e9e db829704 ef73fe2e
> 180: 8f980320 73987e9e db829704 ef73fe2e 8f980320 73987e9e db829704 ef73fe2e
> 1a0: 8f980320 73987e9e db829704 ef73fe2e 8f980320 73987e9e db829704 ef73fe2e
> 1c0: 8f980320 73987e9e db829704 ef73fe2e 8f980320 73987e9e db829704 ef73fe2e
> 1e0: 8f980320 73987e9e db829704 ef73fe2e 8f980320 73987e9e db829704 ef73fe2e

There's your problem - the empty space in the superblock is supposed
to be zero. mkfs zeros it and we rely on it being zero for various
reasons.

And one of those reasons is that we use the fact it shoul dbe zero
to determine if we should be checking the CRC of the superblock.
That is if there's a single bit error in the superblock and we are
missing the correct bit in the version numbers that say CRCs are
enabled, we use the fact that the superblock CRC field - which your
filesystem knowns nothing about - should be zero to validate that
the CRC feature bit is correctly set. The above superblock will
indicate that there is a CRC set on the superblock, find the
necessary version number is not correct, and so therefore we have a
corruption in that superblock that the kernel code cannot handle
without a user telling it what is correct.

So, the fact grwofs is failing is actually the correct behaviour for
the filesystem to have in this case - the superblock is corrupt,
just not obviously so.

> > so we can see the exact contents of that superblock?
> > 
> > FWIW, how many times has this filesystem ben grown?
> 
> I can't say for sure, about 4 or 5 times?
> 
> > Did it start
> > with only 32 AGs (i.e. 10TB in size)?
> 
> 10TB? No. The device just has 3 TB. You most probably meant 10GB?
> I'm not sure, but it definitely started with > 100GB.

I misplaced a digit A block size of 4096 bytes and:

    agcount=42, agsize=7700480 blks

So the filesystem size is 42 * 7700480 * 4096 = 1.26TB.

The question I'm asking is how many AGs did the filesystem start
with, because this:

commit 1375cb65e87b327a8dd4f920c3e3d837fb40e9c2
Author: Dave Chinner <dchinner@xxxxxxxxxx>
Date:   Tue Oct 9 14:50:52 2012 +1100

    xfs: growfs: don't read garbage for new secondary superblocks

    When updating new secondary superblocks in a growfs operation, the
    superblock buffer is read from the newly grown region of the
    underlying device. This is not guaranteed to be zero, so violates
    the underlying assumption that the unused parts of superblocks are
    zero filled. Get a new buffer for these secondary superblocks to
    ensure that the unused regions are zero filled correctly.

    Signed-off-by: Dave Chinner <dchinner@xxxxxxxxxx>
    Reviewed-by: Carlos Maiolino <cmaiolino@xxxxxxxxxx>
    Signed-off-by: Ben Myers <bpm@xxxxxxx>

Is the only possible reason I can think of that would result in
non-zero empty space in a secondary superblock. And that implies
that the filesystem started with 16 AGs or less, and was grown with
an older kernel with this bug in it.

If it makes you feel any better, the bug that caused this had been
in the code for 15+ years and you are the first person I know of to
have ever hit it....

xfs_repair doesn't appear to have any checks in it to detect this
situation or repair it - there are some conditions for zeroing the
unused parts of a superblock, but they are focussed around detecting
and correcting damage caused by a buggy Irix 6.5-beta mkfs from 15
years ago.

Hence looks like we'll need some new xfs_repair functionality to fix
this. It might take me a little while to get you a fix - perhaps
someone else with a little bit of spare time could get it done
sooner than I can. Anyone?

Cheers,

Dave.
-- 
Dave Chinner
david@xxxxxxxxxxxxx

_______________________________________________
xfs mailing list
xfs@xxxxxxxxxxx
http://oss.sgi.com/mailman/listinfo/xfs