Re: Failure growing xfs with linux 3.10.5

Eric Sandeen <sandeen@xxxxxxxxxxx> · Wed, 14 Aug 2013 11:37:54 -0500

On 8/14/13 11:20 AM, Michael Maier wrote:
> Dave Chinner wrote:
>> On Tue, Aug 13, 2013 at 05:30:58PM +0200, Michael Maier wrote:
>>> Dave Chinner wrote:
>>>> [ re-ccing the list, because finding this is in everyone's interest ]
>>>>
>>>> On Mon, Aug 12, 2013 at 06:25:16PM +0200, Michael Maier wrote:
>>>>> Eric Sandeen wrote:
>>>>>> On 8/11/13 2:11 AM, Michael Maier wrote:
>>>>>>> Hello!
>>>>>>>
>>>>>>> I think I'm facing the same problem as already described here:
>>>>>>> http://thread.gmane.org/gmane.comp.file-systems.xfs.general/54428
>>>>>>
>>>>>> Maybe you can try the tracing Dave suggested in that thread?
>>>>>> It certainly does look similar.
>>>>>
>>>>> I attached a trace report while executing xfs_growfs /mnt on linux 3.10.5 (does not happen with 3.9.8).
>>>>>
>>>>> xfs_growfs /mnt
>>>>> meta-data=/dev/mapper/backupMy-daten3 isize=256    agcount=42, agsize=7700480 blks
>>>>>          =                       sectsz=512   attr=2
>>>>> data     =                       bsize=4096   blocks=319815680, imaxpct=25
>>>>>          =                       sunit=0      swidth=0 blks
>>>>> naming   =version 2              bsize=4096   ascii-ci=0
>>>>> log      =internal               bsize=4096   blocks=60160, version=2
>>>>>          =                       sectsz=512   sunit=0 blks, lazy-count=1
>>>>> realtime =none                   extsz=4096   blocks=0, rtextents=0
>>>>> xfs_growfs: XFS_IOC_FSGROWFSDATA xfsctl failed: Structure needs cleaning
>>>>> data blocks changed from 319815680 to 346030080
>>>>>
>>>>> The entry in messages was:
>>>>>
>>>>> Aug 12 18:09:50 dualc kernel: [  257.368030] ffff8801e8dbd400: 58 46 53 42 00 00 10 00 00 00 00 00 13 10 00 00  XFSB............
>>>>> Aug 12 18:09:50 dualc kernel: [  257.368037] ffff8801e8dbd410: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
>>>>> Aug 12 18:09:50 dualc kernel: [  257.368042] ffff8801e8dbd420: 46 91 c6 80 a9 a9 4d 8c 8f e2 18 fd e8 7f 66 e1  F.....M.......f.
>>>>> Aug 12 18:09:50 dualc kernel: [  257.368045] ffff8801e8dbd430: 00 00 00 00 04 00 00 04 00 00 00 00 00 00 00 80  ................
>>>>> Aug 12 18:09:50 dualc kernel: [  257.368051] XFS (dm-33): Internal error xfs_sb_read_verify at line 730 of file
>>>>> /daten2/tmp/rpm/BUILD/kernel-desktop-3.10.5/linux-3.10/fs/xfs/xfs_mount.c.  Caller 0xffffffffa099a2fd
>>>> .....
>>>>> Aug 12 18:09:50 dualc kernel: [  257.368533] XFS (dm-33): Corruption detected. Unmount and run xfs_repair
>>>>> Aug 12 18:09:50 dualc kernel: [  257.368611] XFS (dm-33): metadata I/O error: block 0x3ac00000 ("xfs_trans_read_buf_map") error 117 numblks 1
>>>>> Aug 12 18:09:50 dualc kernel: [  257.368623] XFS (dm-33): error 117 reading secondary superblock for ag 16
>>>>
>>>> Ok, so that's reading the secondary superblock for AG 16. You're
>>>> growing the filesystem from 42 to 45 AGs, so this problem is not
>>>> related to the actual grow operation - it's tripping over a problem
>>>> that already exists on disk before the grow operation is started.
>>>> i.e. this is likely to be a real corruption being seen, and it
>>>> happened some time in the distant past and so we probably won't ever
>>>> be able to pinpoint the cause of the problem.
>>>>
>>>> That said, let's have a look at the broken superblock. Can you post
>>>> the output of the commands:
>>>>
>>>> # xfs_db -r -c "sb 16" -c p <dev>
>>>
>>> done after the failed growfs mentioned above:
>>
>> Looks fine....
>>
>>>> and
>>>>
>>>> # xfs_db -r -c "sb 16" -c "type data" -c p <dev>
>>>
>>> 000: 58465342 00001000 00000000 13100000 00000000 00000000 00000000 00000000
>>> 020: 4691c680 a9a94d8c 8fe218fd e87f66e1 00000000 04000004 00000000 00000080
>>> 040: 00000000 00000081 00000000 00000082 00000001 00758000 0000002a 00000000
>>> 060: 0000eb00 b4a40200 01000010 00000000 00000000 00000000 0c090804 17000019
>>> 080: 00000000 00001940 00000000 00000277 00000000 001126ba 00000000 00000000
>>> 0a0: 00000000 00000000 00000000 00000000 00000000 00000002 00000000 00000000
>>> 0c0: 00000000 00000001 0000000a 0000000a 8f980320 73987e9e db829704 ef73fe2e
>>> 0e0: 8f980320 73987e9e db829704 ef73fe2e 8f980320 73987e9e db829704 ef73fe2e
>>> 100: 8f980320 73987e9e db829704 ef73fe2e 8f980320 73987e9e db829704 ef73fe2e
>>> 120: 8f980320 73987e9e db829704 ef73fe2e 8f980320 73987e9e db829704 ef73fe2e
>>> 140: 8f980320 73987e9e db829704 ef73fe2e 8f980320 73987e9e db829704 ef73fe2e
>>> 160: 8f980320 73987e9e db829704 ef73fe2e 8f980320 73987e9e db829704 ef73fe2e
>>> 180: 8f980320 73987e9e db829704 ef73fe2e 8f980320 73987e9e db829704 ef73fe2e
>>> 1a0: 8f980320 73987e9e db829704 ef73fe2e 8f980320 73987e9e db829704 ef73fe2e
>>> 1c0: 8f980320 73987e9e db829704 ef73fe2e 8f980320 73987e9e db829704 ef73fe2e
>>> 1e0: 8f980320 73987e9e db829704 ef73fe2e 8f980320 73987e9e db829704 ef73fe2e
>>
>> There's your problem - the empty space in the superblock is supposed
>> to be zero. mkfs zeros it and we rely on it being zero for various
>> reasons.
>>
>> And one of those reasons is that we use the fact it shoul dbe zero
>> to determine if we should be checking the CRC of the superblock.
>> That is if there's a single bit error in the superblock and we are
>> missing the correct bit in the version numbers that say CRCs are
>> enabled, we use the fact that the superblock CRC field - which your
>> filesystem knowns nothing about - should be zero to validate that
>> the CRC feature bit is correctly set. The above superblock will
>> indicate that there is a CRC set on the superblock, find the
>> necessary version number is not correct, and so therefore we have a
>> corruption in that superblock that the kernel code cannot handle
>> without a user telling it what is correct.
>>
>> So, the fact grwofs is failing is actually the correct behaviour for
>> the filesystem to have in this case - the superblock is corrupt,
>> just not obviously so.
>>
>>>> so we can see the exact contents of that superblock?
>>>>
>>>> FWIW, how many times has this filesystem ben grown?
>>>
>>> I can't say for sure, about 4 or 5 times?
>>>
>>>> Did it start
>>>> with only 32 AGs (i.e. 10TB in size)?
>>>
>>> 10TB? No. The device just has 3 TB. You most probably meant 10GB?
>>> I'm not sure, but it definitely started with > 100GB.
>>
>> I misplaced a digit A block size of 4096 bytes and:
>>
>>     agcount=42, agsize=7700480 blks
>>
>> So the filesystem size is 42 * 7700480 * 4096 = 1.26TB.
>>
>> The question I'm asking is how many AGs did the filesystem start
>> with, because this:
>>
>> commit 1375cb65e87b327a8dd4f920c3e3d837fb40e9c2
>> Author: Dave Chinner <dchinner@xxxxxxxxxx>
>> Date:   Tue Oct 9 14:50:52 2012 +1100
>>
>>     xfs: growfs: don't read garbage for new secondary superblocks
>>     
>>     When updating new secondary superblocks in a growfs operation, the
>>     superblock buffer is read from the newly grown region of the
>>     underlying device. This is not guaranteed to be zero, so violates
>>     the underlying assumption that the unused parts of superblocks are
>>     zero filled. Get a new buffer for these secondary superblocks to
>>     ensure that the unused regions are zero filled correctly.
>>     
>>     Signed-off-by: Dave Chinner <dchinner@xxxxxxxxxx>
>>     Reviewed-by: Carlos Maiolino <cmaiolino@xxxxxxxxxx>
>>     Signed-off-by: Ben Myers <bpm@xxxxxxx>
>>
>> Is the only possible reason I can think of that would result in
>> non-zero empty space in a secondary superblock. And that implies
>> that the filesystem started with 16 AGs or less,
> 
> yes
> 
>> and was grown with
>> an older kernel with this bug in it.
> 
> yes.
> 
>> If it makes you feel any better, the bug that caused this had been
>> in the code for 15+ years and you are the first person I know of to
>> have ever hit it....
> 
> Probably the second one :-) See
> http://thread.gmane.org/gmane.comp.file-systems.xfs.general/54428
> 
>> xfs_repair doesn't appear to have any checks in it to detect this
>> situation or repair it - there are some conditions for zeroing the
>> unused parts of a superblock, but they are focussed around detecting
>> and correcting damage caused by a buggy Irix 6.5-beta mkfs from 15
>> years ago.
> 
> The _big problem_ is: xfs_repair not just doesn't repair it, but it
> _causes data loss_ in some situations!
> 
> Given the following situation I ran in:
> - xfs_growfs started running linux 3.10.5.
> 
> - Saw the error message on the konsole:
> XFS_IOC_FSGROWFSDATA xfsctl failed: Structure needs cleaning
> data blocks changed from 319815680 to 346030080
> 
> - Checked with df -> The growing seems to be done. Decision: Analyse the
> problem later when there is more time.
> 
> - Some days later, entry found in messages:
> "Corruption detected. Unmount and run xfs_repair"
> 
> - I did it as suggested.
>   Result: FS has the original size again before growing the FS and
> complete loss of all data written since this faulty growing. And: FS
> isn't repaired.
> If it is not a problem at all (that's how I understood you here), why is
> there a error message and the suggest to run xfs_repair, which obviously
> isn't able at all to repair this problem but leads directly to data loss?

It seems that perhaps the error during growfs has left the filesystem
in a dangerous state - perhaps 45 AGs in memory but only 42 on disk,
I'm not certain.  So you proceeded with the mounted fs thinking it had
more space, but When you did the subsequent repair, it only found 42
on disk, and "fixed" it by removing anything past that.  </handwave>

> Thanks for your clarification. I hope other people read this thread
> before they are loosing data :-(.
> 
> 
> What to do now?
> - Don't use >= 3.10.x kernel. Or:

That's probably a decent workaround in the short term, at least for
xfs_growfs.

> - Ignore it (how can I distinguish this case from other cases?) Or:
> - Recreate the complete FS.

or:

- wait a bit 'til we get xfs_repair fixed to address the root cause.

I'll take a look at the image you provided me with, and see if I can
make some quick progress.

-Eric

> 
> Thanks for clarification,
> regards,
> Michael.
> 
> _______________________________________________
> xfs mailing list
> xfs@xxxxxxxxxxx
> http://oss.sgi.com/mailman/listinfo/xfs
> 

_______________________________________________
xfs mailing list
xfs@xxxxxxxxxxx
http://oss.sgi.com/mailman/listinfo/xfs