Re: Growing RAID10 with active XFS filesystem

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Mon, Jan 08, 2018 at 08:08:09PM +0100, xfs.pkoch@xxxxxxxx wrote:
> Dear Linux-Raid and Linux-XFS experts:
> 
> I'm posting this on both the linux-raid and linux-xfs
> mailing list as it's not clear at this point wether
> this is a MD- od XFS-problem.
> 
> I have described my problem in a recent posting on
> linux-raid and Wol's conclusion was:
> 
> >In other words, one or more of the following three are true :-
> >1) The OP has been caught by some random act of God
> >2) There's a serious flaw in "mdadm --grow"
> >3) There's a serious flaw in xfs
> >
> >Cheers,
> >Wol
> 
> There's very important data on our RAID10 device but I doubt
> it's important enough for God to take a hand into our storage.
> 
> But let me first summarize what happened and why I believe that
> this is an XFS-problem:
> 
> Machine running Linux 3.14.69 with no kernel-patches.
> 
> XFS filesystem was created with XFS userutils 3.1.11.
> I did a fresh compile of xfsprogs-4.9.0 yesterday when
> I realized that the 3.1.11 xfs_repair did not help.
> 
> mdadm is V3.3
> 
> /dev/md5 is a RAID10-device that was created in Feb 2013
> with 10 2TB disks and an ext3 filesystem on it. Once in a
> while I added two more 2TB disks. Reshaping was done
> while the ext3 filesystem was mounted. Then the ext3
> filesystem was unmounted resized and mounted again. That
> worked until I resized the RAID10 from 16 to 20 disks and
> realized that ext3 does not support filesystems >16TB.
> 
> I switched to XFS and created a 20TB filesystem. Here are
> the details:
> 
> # xfs_info /dev/md5
> meta-data=/dev/md5               isize=256    agcount=32,
> agsize=152608128 blks
>           =                       sectsz=512   attr=2
> data     =                       bsize=4096   blocks=4883457280, imaxpct=5
>           =                       sunit=128    swidth=1280 blks
> naming   =version 2              bsize=4096   ascii-ci=0
> log      =internal               bsize=4096   blocks=521728, version=2
>           =                       sectsz=512   sunit=8 blks, lazy-count=1
> realtime =none                   extsz=4096   blocks=0, rtextents=0
> 
> Please notice: Ths XFS-filesystem has a size of
> 4883457280*4K = 19,533,829,120K
> 
> On saturday I tried to add two more 2TB disks to the RAID10
> and the XFS filesystem was mounted (and in medium use) at that
> time. Commands were:
> 
> # mdadm /dev/md5 --add /dev/sdo
> # mdadm --grow /dev/md5 --raid-devices=21
> 
> # mdadm -D /dev/md5
> /dev/md5:
>          Version : 1.2
>    Creation Time : Sun Feb 10 16:58:10 2013
>       Raid Level : raid10
>       Array Size : 19533829120 (18628.91 GiB 20002.64 GB)
>    Used Dev Size : 1953382912 (1862.89 GiB 2000.26 GB)
>     Raid Devices : 21
>    Total Devices : 21
>      Persistence : Superblock is persistent
> 
>      Update Time : Sat Jan  6 15:08:37 2018
>            State : clean, reshaping
>   Active Devices : 21
> Working Devices : 21
>   Failed Devices : 0
>    Spare Devices : 0
> 
>           Layout : near=2
>       Chunk Size : 512K
> 
>   Reshape Status : 1% complete
>    Delta Devices : 1, (20->21)
> 
>             Name : backup:5  (local to host backup)
>             UUID : 9030ff07:6a292a3c:26589a26:8c92a488
>           Events : 86002
> 
>      Number   Major   Minor   RaidDevice State
>         0       8       16        0      active sync   /dev/sdb
>         1      65       48        1      active sync   /dev/sdt
>         2       8       64        2      active sync   /dev/sde
>         3      65       96        3      active sync   /dev/sdw
>         4       8      112        4      active sync   /dev/sdh
>         5      65      144        5      active sync   /dev/sdz
>         6       8      160        6      active sync   /dev/sdk
>         7      65      192        7      active sync   /dev/sdac
>         8       8      208        8      active sync   /dev/sdn
>         9      65      240        9      active sync   /dev/sdaf
>        10      65        0       10      active sync   /dev/sdq
>        11      66       32       11      active sync   /dev/sdai
>        12       8       32       12      active sync   /dev/sdc
>        13      65       64       13      active sync   /dev/sdu
>        14       8       80       14      active sync   /dev/sdf
>        15      65      112       15      active sync   /dev/sdx
>        16       8      128       16      active sync   /dev/sdi
>        17      65      160       17      active sync   /dev/sdaa
>        18       8      176       18      active sync   /dev/sdl
>        19      65      208       19      active sync   /dev/sdad
>        20       8      224       20      active sync   /dev/sdo
> 
> Please notice: Ths RAID10-device has a size of 19,533,829,120K
> that's exactly the same size as the contained XFS-filesystem.
> 
> Immediately after the RAID10 reshape operation started the
> XFS-filesystem reported I/O-errors and was severly damaged.
> I waited for the reshape operation to finish and tried to repair
> the filesystem with xfs_repair (version 3.1.11) but xfs_repair
> crashed, so I tried 4.9.0-version aif xfs_reapair with no luck
> either.
> 
> /dev/md5 ist now mounted ro,norecovery with an overlay filesystem
> on top of it (thanks very much to Andreas for that idea) and I have
> setup a new server today. Rsyncing the data to the new server will
> take a while and I'm sure I will stumble on lots of corrupted files.
> I proceeded from XFS to ZFS (skipped YFS) so lengthy reshape
> operations won't happen in the future anymore.
> 
> Here are the relevant log messages:
> 
> >Jan  6 14:45:00 backup kernel: md: reshape of RAID array md5
> >Jan  6 14:45:00 backup kernel: md: minimum _guaranteed_  speed: 1000 KB/sec/disk.
> >Jan  6 14:45:00 backup kernel: md: using maximum available idle IO bandwidth (but not more than 200000 KB/sec) for reshape.
> >Jan  6 14:45:00 backup kernel: md: using 128k window, over a total of 19533829120k.
> >Jan  6 14:45:00 backup kernel: XFS (md5): metadata I/O error: block 0x12c08f360 ("xfs_trans_read_buf_map") error 5 numblks 16
> >Jan  6 14:45:00 backup kernel: XFS (md5): xfs_imap_to_bp: xfs_trans_read_buf() returned error 5.
> >Jan  6 14:45:00 backup kernel: XFS (md5): metadata I/O error: block 0x12c08f360 ("xfs_trans_read_buf_map") error 5 numblks 16
> >Jan  6 14:45:00 backup kernel: XFS (md5): xfs_imap_to_bp: xfs_trans_read_buf() returned error 5.
> >... hundreds of the above XFS-messages deleted
> >Jan  6 14:45:00 backup kernel: XFS (md5): Log I/O Error Detected.  Shutting down filesystem
> >Jan  6 14:45:00 backup kernel: XFS (md5): Please umount the filesystem and rectify the problem(s)
> 
> Please notice: no error message about hardware-problems.
> All 21 disks are fine and the next messages from the
> md-driver was:
> 
> >Jan  7 02:28:02 backup kernel: md: md5: reshape done.
> >Jan  7 02:28:03 backup kernel: md5: detected capacity change from 20002641018880 to 21002772807680
> 
> I'm wondering about one thing: the first xfs message is about a
> meatadata I/O error on block 0x12c08f360. Since the xfs filesystem

I'm sure Dave will have more to say about this, but...

"block 0x12c08f360" == units of sectors, not fs blocks.

IOWs, this IO error happened at offset 2,577,280,712,704 (~2.5TB)

XFS doesn't change the fs size until you tell it to (via growfs);
even if the underlying storage geometry changes, XFS won't act on it
until the admin tells it to.

What did xfs_repair do?

--D

> has a blocksize of 4K this block is located at position 20135005568K
> which is beyond the end of the RAID10 device. No wonder that the
> xfs driver receives an I/O error. And also no wonder that the
> filesystem is severely corrupted right now.
> 
> Question 1: How did the xfs driver knew on Jan 6 that the RAID10
> device was about to be increased from 20TB to 21TB on Jan 7?
> 
> Question 2: Why did the xfs driver started to use the additional
> space that was not yet there without me executing xfs_growfs.
> 
> This looks like a severe XFS-problem to me.
> 
> But my hope is that all the data taht was within the filesystem
> before Jan 6 14:45 is not involved in the corruption. If xfs
> started to use space beyond the end of the underlying raid
> device this should have affected only data that was created,
> modified or deleted after Jan 6 14:45.
> 
> If that was true we could clearly distinct between data
> that we must dump and data that we can keep. The machine is
> our backup system (as you may have guessed from its name)
> and I would like to keep old backup-files.
> 
> I remember that mkfs.xfs is clever enough to adopt the
> filesystem paramters to the underlying hardware of the
> block device that the xfs filesystem is created on. Hence
> from the xfs drivers point of view the underlying block
> device is not just a sequence of data blocks, but the xfs
> driver knows something about the layout of the underlying
> hardware.
> 
> If that was true - how does the xfs driver reacts if that
> information about the layout of the underlying hardware
> changes while the xfs-filesystem is mounted?
> 
> Seems to be an interesting problem
> 
> Kind regards
> 
> Peter Koch
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
> the body of a message to majordomo@xxxxxxxxxxxxxxx
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html



[Index of Archives]     [XFS Filesystem Development (older mail)]     [Linux Filesystem Development]     [Linux Audio Users]     [Yosemite Trails]     [Linux Kernel]     [Linux RAID]     [Linux SCSI]


  Powered by Linux