Online resize issue with 3.13.5 & 3.15.6

Brad Campbell <lists2009@xxxxxxxxxxxxxxx> · Sun, 20 Jul 2014 19:26:19 +0800

G'day all,

Machine was running 3.13.5. x86_64.

I had a 12 device (2TB) RAID-6 formatted ext4. I added 2 drives to its 
underlying md and restriped it (no issues). After the restripe I 
attempted an online resize using ext2progs 1.42.5 (Debian stable). This 
failed with a message about the size not fitting into 32 bits so I 
compiled 1.42.11 and tried again.

This resulted in a message I no longer have access to that indicated 
that something went wrong. I attempted it a couple more times (how dumb 
am I?) The resulting parts of dmesg are :

Jul 20 17:20:13 srv kernel: [11893469.381692] EXT4-fs (md0): resizing 
filesystem from 4883458240 to 5860149888 blocks
Jul 20 17:20:23 srv kernel: [11893479.597505] EXT4-fs (md0): resized to 
5128585216 blocks

Jul 20 17:20:43 srv kernel: [11893499.681961] EXT4-fs (md0): resized to 
5525995520 blocks
Jul 20 17:20:53 srv kernel: [11893509.762719] EXT4-fs (md0): resized to 
5641863168 blocks
Jul 20 17:21:02 srv kernel: [11893517.869988] EXT4-fs warning (device 
md0): verify_reserved_gdb:705: reserved GDT 2769 missing grp 177147 
(5804755665)
Jul 20 17:21:02 srv kernel: [11893517.906663] EXT4-fs (md0): resized 
filesystem to 5860149888
Jul 20 17:21:08 srv kernel: [11893523.795964] EXT4-fs warning (device 
md0): ext4_group_extend:1712: can't shrink FS - resize aborted
Jul 20 17:21:17 srv kernel: [11893533.224440] EXT4-fs (md0): resizing 
filesystem from 5804916736 to 5860149888 blocks
Jul 20 17:21:17 srv kernel: [11893533.261982] EXT4-fs warning (device 
md0): verify_reserved_gdb:705: reserved GDT 2769 missing grp 177147 
(5804755665)
Jul 20 17:21:17 srv kernel: [11893533.300352] EXT4-fs (md0): resized 
filesystem to 5860149888
Jul 20 17:21:17 srv kernel: [11893533.636745] EXT4-fs warning (device 
md0): ext4_group_extend:1712: can't shrink FS - resize aborted

Jul 20 17:23:11 srv kernel: [11893647.253580] EXT4-fs (md0): resizing 
filesystem from 5804916736 to 5860149888 blocks
Jul 20 17:23:11 srv kernel: [11893647.291562] EXT4-fs warning (device 
md0): verify_reserved_gdb:705: reserved GDT 2769 missing grp 177147 
(5804755665)
Jul 20 17:23:11 srv kernel: [11893647.330267] EXT4-fs (md0): resized 
filesystem to 5860149888
Jul 20 17:23:12 srv kernel: [11893647.675745] EXT4-fs warning (device 
md0): ext4_group_extend:1712: can't shrink FS - resize aborted

At this point I thought it best to reboot the machine, so I upgraded to 
3.15.6 and brought it up in single user mode. The filesystem passed fsck 
with a message about an uninitialised block group and no other errors. 
I've since repeated the fsck several times and it is clean.

The issue is it locks up resize2fs hard (just spins on one core). Once 
it starts spinning there is no strace, so it's chasing its tail.

This is the current state of the fs.

root@srv:/s# dumpe2fs -h /dev/md0
dumpe2fs 1.42.11 (09-Jul-2014)
Filesystem volume name:   <none>
Last mounted on:          /s/src
Filesystem UUID:          99566e8e-e66d-4351-9675-0b3a549e2ba5
Filesystem magic number:  0xEF53
Filesystem revision #:    1 (dynamic)
Filesystem features:      has_journal ext_attr resize_inode dir_index 
filetype extent 64bit flex_bg sparse_super large_file huge_file 
uninit_bg dir_nlink extra_isize
Filesystem flags:         signed_directory_hash
Default mount options:    user_xattr acl
Filesystem state:         clean
Errors behavior:          Continue
Filesystem OS type:       Linux
Inode count:              362807296
Block count:              5804916736
Reserved block count:     0
Free blocks:              1407676872
Free inodes:              358800089
First block:              0
Block size:               4096
Fragment size:            4096
Group descriptor size:    64
Reserved GDT blocks:      585
Blocks per group:         32768
Fragments per group:      32768
Inodes per group:         2048
Inode blocks per group:   128
RAID stride:              32
RAID stripe width:        320
Flex block group size:    16
Filesystem created:       Wed Jul 31 15:02:47 2013
Last mount time:          Sun Jul 20 17:41:16 2014
Last write time:          Sun Jul 20 18:48:00 2014
Mount count:              0
Maximum mount count:      -1
Last checked:             Sun Jul 20 18:48:00 2014
Check interval:           0 (<none>)
Lifetime writes:          4088 GB
Reserved blocks uid:      0 (user root)
Reserved blocks gid:      0 (group root)
First inode:              11
Inode size:	          256
Required extra isize:     28
Desired extra isize:      28
Journal inode:            8
Default directory hash:   half_md4
Directory Hash Seed:      c08e3b0a-2c23-4b0f-b2d6-9bb8f26e0b48
Journal backup:           inode blocks
Journal features:         journal_incompat_revoke journal_64bit
Journal size:             128M
Journal length:           32768
Journal sequence:         0x00229921
Journal start:            0

root@srv:/s# mdadm --detail /dev/md0
/dev/md0:
        Version : 1.2
  Creation Time : Wed Jul 31 15:02:11 2013
     Raid Level : raid6
     Array Size : 23440599552 (22354.70 GiB 24003.17 GB)
  Used Dev Size : 1953383296 (1862.89 GiB 2000.26 GB)
   Raid Devices : 14
  Total Devices : 14
    Persistence : Superblock is persistent

  Intent Bitmap : Internal

    Update Time : Sun Jul 20 18:54:56 2014
          State : active
 Active Devices : 14
Working Devices : 14
 Failed Devices : 0
  Spare Devices : 0

         Layout : left-symmetric
     Chunk Size : 128K

           Name : srv:0  (local to host srv)
           UUID : a66b7f8a:dcf6b939:c14a87af:b21fcedf
         Events : 303231

    Number   Major   Minor   RaidDevice State
       0       8       64        0      active sync   /dev/sde
       1       8      144        1      active sync   /dev/sdj
       2       8      160        2      active sync   /dev/sdk
      14       8      176        3      active sync   /dev/sdl
       4       8      192        4      active sync   /dev/sdm
       5       8      224        5      active sync   /dev/sdo
       6       8      208        6      active sync   /dev/sdn
       7      65        0        7      active sync   /dev/sdq
       8      65       16        8      active sync   /dev/sdr
       9      65       48        9      active sync   /dev/sdt
      13      65      112       10      active sync   /dev/sdx
      12       8       32       11      active sync   /dev/sdc
      16      65       32       12      active sync   /dev/sds
      15       8      240       13      active sync   /dev/sdp

The filesystem looks clean, everything is accessible and though this is 
a production box, no business critical elements are on this array so we 
can live without it mounted if someone can give me some stuff to try.

Regards,
Brad
--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html