Fwd: e2fsck -fD corruption of large htree/extent directory

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Ted, per our discussion this morning, here are the details of the
e2fsck -fD corruption problem we saw.

Running e2fsck -fD on a large extent+htree directory (> 300k entries,
1600+ filesystem blocks) showed corruption on a large number of dirs.
This is definitely caused by a bug in the code rather than hardware, as
this corrupted multiple large directories on 11 different systems.
Sometimes, similar directories on the same systems did not have errors.

As yet the reason and mechanism has not been determined, but it may
relate to the filesystem history (the directories may have originally
been block mapped, an in any case the blocks are mostly discontiguous
on disk).  These dirs undergo continuous insertion and deletion of
entries with ~10-character filenames, so the leaf blocks may have become
quite fragmented over time.

Running e2fsck on the filesystem showed:

   e2fsck 1.42.12.wc1 (15-Sep-2014)
   MMP interval is 5 seconds and total wait time is 22 seconds. Please wait.
   Pass 1: Checking inodes, blocks, and sizes
   Interior extent node level 1 of inode 39321606:
   Logical start 1430 does not match logical start 1875 at next level.
Fix? yes

   Inode 39321606, end of extent exceeds allowed value
   (logical block 1875, physical block 1258402260, len 1)
   Clear? yes

   Failed to iterate extents in inode 39321606
   (op EXT2_EXTENT_UP, blk 1258402260, lblk 1875): No 'up' extent
   Clear inode? yes

   Inode 39321606 is a zero-length directory.  Clear? yes

   Update quota info for quota type 0? yes
   Update quota info for quota type 1? yes

   Restarting e2fsck from the beginning...
   Pass 1: Checking inodes, blocks, and sizes
   Pass 2: Checking directory structure
   Entry 'd2' in /O/0 (39321602) has deleted/unused inode 39321606.
   Clear? yes

   Pass 3: Checking directory connectivity
   Pass 4: Checking reference counts
   Unattached inode 147
   Connect to /lost+found? yes
   Inode 147 ref count is 2, should be 1.  Fix? yes

   Unattached inode 173
   Connect to /lost+found? yes
   Inode 173 ref count is 2, should be 1.  Fix? yes
   :
   :
   Unattached inode 92016391
   Connect to /lost+found? yes

   Inode 92016391 ref count is 2, should be 1.  Fix? yes
   Pass 5: Checking group summary information
   Block bitmap differences:  -1258308100

   Update quota info for quota type 0? yesm
   Update quota info for quota type 1? yes

   scratch-OST0049: ***** FILE SYSTEM WAS MODIFIED *****

Stat data for the corrupted directory inode:

   debugfs -c -R "stat <39321606>"
   Inode: 39321606 Type: directory Mode: 0700 Flags: 0x81000
   Generation: 2310511783 Version: 0x00000000:00000000
   User: 0 Group: 0 Size: 6750208
   File ACL: 0 Directory ACL: 0
   Links: 2 Blockcount: 13232
   Fragment: Address: 0 Number: 0 Size: 0
   ctime: 0x563111cf:15fb2694 -- Wed Oct 28 14:19:59 2015
   atime: 0x52f30c97:9fe5c3ac -- Wed Feb 5 23:16:23 2014
   mtime: 0x563111cf:15fb2694 -- Wed Oct 28 14:19:59 2015
   crtime: 0x52f30c97:9fe5c3ac -- Wed Feb 5 23:16:23 2014
   Size of extra inode fields: 28
   Extended attributes stored in inode body:
   invalid EA entry in inode
   EXTENTS:
   [shown below]

The debugfs dump_extents command shows that the extent tree is mostly OK.
In all observed cases, the extent tree was 5 blocks long (possibly a
result of 4 extent blocks being moved out of the in-inode i_block[]
array and into an external second-level index block), or because the
number of entries in each directory is roughly the same, not sure.

Level Entries       Logical                Physical Length Flags
 0/ 2   1/  1     0 -  1647 1258392344                1648
 1/ 2   1/  5     0 -   353 1258308301                 354
 2/ 2   1/340     0 -     0 1258308100 - 1258308100      1
 2/ 2   2/340     1 -     2 1258308174 - 1258308175      2
 2/ 2   3/340     3 -     3 1258308213 - 1258308213      1
 2/ 2   4/340     4 -     4 1258308241 - 1258308241      1
 :
 :
 2/ 2 339/340   352 -   352 1258319291 - 1258319291      1
 2/ 2 340/340   353 -   353 1258319375 - 1258319375      1
 1/ 2   2/  5   354 -   704 1258319416                 351
 2/ 2   1/340   354 -   354 1258319415 - 1258319415      1
 2/ 2   2/340   355 -   355 1258319470 - 1258319470      1
 :
 :
 2/ 2 339/340   703 -   703 1258350886 - 1258350886      1
 2/ 2 340/340   704 -   704 1258350895 - 1258350895      1
 1/ 2   3/  5   705 -  1055 1258350929                 351
 2/ 2   1/339   705 -   705 1258350928 - 1258350928      1
 2/ 2   2/339   706 -   706 1258343948 - 1258343948      1
 :
 :
 2/ 2 336/339  1052 -  1052 1258365348 - 1258365348      1
 2/ 2 337/339  1053 -  1053 1258365355 - 1258365355      1
 2/ 2 338/339  1054 -  1054 1258365417 - 1258365417      1
 2/ 2 339/339  1055 -  1055 1258365432 - 1258365432      1
 1/ 2   4/  5  1056 -  1874 1258324458                 819
 2/ 2   1/340  1056 -  1056 1258365435 - 1258365435      1
 2/ 2   2/340  1057 -  1057 1258366983 - 1258366983      1
 2/ 2   3/340  1058 -  1059 1258366993 - 1258366994      2
 :
 :
 2/ 2 338/340  1427 -  1427 1258379312 - 1258379312      1
 2/ 2 339/340  1428 -  1428 1258379117 - 1258379117      1
 2/ 2 340/340  1429 -  1429 1258379133 - 1258379133      1
 1/ 2   5/  5  1875 - 4294968943 1258406330              4294967069
 2/ 2   1/  1  1875 -  1875 1258402260 - 1258402260      1

The 4/5 extent index block shows an extent length of 1874 - 1056 = 819
blocks, but the extent block only has 1429 - 1056 = 373 blocks in the
extent.  The extent root block reports 1648 blocks, which matches both
i_size and i_blocks.  There appears to be one block missing from the
extent tree, or it was clobbered by 5/5 during an update, and/or the
starting offset of block 5/5 is just wrong.

There doesn't appear to be any other data corruption in the filesystem
besides the directory extent blocks, but this resulted in several
hundred leaf blocks being lost per directory, resulting in millions of
files in lost+found (see my other recent email on that topic).

In some cases, it appears that 100% of files were readable from the
corrupted directory using debugfs _before_ the e2fsck was run:

   debugfs -c -R "ls -l $DIR" $DEV

even though e2fsck was unhappy with the extent structure and cleared
part of the extent tree and dumped the files into lost+found.  This
implies that the directory entries were all moved into the first blocks
of the directory (i.e. leaf blocks under extent indices 1/5..4/5, and
the blocks in the corrupt part of the directory were somehow "extra" and
the bug lies in the extent handling when shrinking the directory.

Cheers, Andreas





Attachment: signature.asc
Description: Message signed with OpenPGP using GPGMail


[Index of Archives]     [Reiser Filesystem Development]     [Ceph FS]     [Kernel Newbies]     [Security]     [Netfilter]     [Bugtraq]     [Linux FS]     [Yosemite National Park]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Samba]     [Device Mapper]     [Linux Media]

  Powered by Linux