mdraid> I was under the impression that growing a RAID10 device could mdraid> be done with an active filesystem running on the device. It should be just fine. But in this case, you might also want to talk with the XFS experts. mdraid> I did this a couple of times when I added additional 2TB disks mdraid> to our production RAID10 running an ext3 Filesystem. That was mdraid> a very time consuming process and we had to use the filesystem mdraid> during the reshape. What kernel and distro are you running here? What are the mdadm tools versions? You need to give more details please. mdraid> When I increased the size of the RAID10 from 16 to 20 mdraid> 2TB-disks I could not use ext3 anymore due to the 16TB maimum mdraid> size limitation of ext3 and I replaced the ext3 filesystem by mdraid> xfs. That must have been fun... not. mdraid> Now today I increased the RAID10 again from 20 to 21 disks with the mdraid> following commands: mdraid> mdadm /dev/md5 --add /dev/sdo mdraid> mdadm --grow /dev/md5 --raid-devices=21 mdraid> My plans were to add another disk after that and then grow mdraid> the XFS-filesystem. I do not add multiple disks at once since mdraid> its hard to predict which disk will end up in what disk-set mdraid> Here's mdadm -D /dev/md5 output: mdraid> /dev/md5: mdraid> Version : 1.2 mdraid> Creation Time : Sun Feb 10 16:58:10 2013 mdraid> Raid Level : raid10 mdraid> Array Size : 19533829120 (18628.91 GiB 20002.64 GB) mdraid> Used Dev Size : 1953382912 (1862.89 GiB 2000.26 GB) mdraid> Raid Devices : 21 mdraid> Total Devices : 21 mdraid> Persistence : Superblock is persistent mdraid> Update Time : Sat Jan 6 15:08:37 2018 mdraid> State : clean, reshaping mdraid> Active Devices : 21 mdraid> Working Devices : 21 mdraid> Failed Devices : 0 mdraid> Spare Devices : 0 mdraid> Layout : near=2 mdraid> Chunk Size : 512K mdraid> Reshape Status : 1% complete mdraid> Delta Devices : 1, (20->21) mdraid> Name : backup:5 (local to host backup) mdraid> UUID : 9030ff07:6a292a3c:26589a26:8c92a488 mdraid> Events : 86002 mdraid> Number Major Minor RaidDevice State mdraid> 0 8 16 0 active sync /dev/sdb mdraid> 1 65 48 1 active sync /dev/sdt mdraid> 2 8 64 2 active sync /dev/sde mdraid> 3 65 96 3 active sync /dev/sdw mdraid> 4 8 112 4 active sync /dev/sdh mdraid> 5 65 144 5 active sync /dev/sdz mdraid> 6 8 160 6 active sync /dev/sdk mdraid> 7 65 192 7 active sync /dev/sdac mdraid> 8 8 208 8 active sync /dev/sdn mdraid> 9 65 240 9 active sync /dev/sdaf mdraid> 10 65 0 10 active sync /dev/sdq mdraid> 11 66 32 11 active sync /dev/sdai mdraid> 12 8 32 12 active sync /dev/sdc mdraid> 13 65 64 13 active sync /dev/sdu mdraid> 14 8 80 14 active sync /dev/sdf mdraid> 15 65 112 15 active sync /dev/sdx mdraid> 16 8 128 16 active sync /dev/sdi mdraid> 17 65 160 17 active sync /dev/sdaa mdraid> 18 8 176 18 active sync /dev/sdl mdraid> 19 65 208 19 active sync /dev/sdad mdraid> 20 8 224 20 active sync /dev/sdo This all looks fine... but I'm thinking what you *should* have done instead is build a bunch of 2tb pairs, and them use LVM to span across them with a volume, then build your XFS filesystem ontop of that. This way you would have /dev/md1,2,3,4,5,6,7,8,9,10 all inside a VG, then you would use LVM to stripe across the pairs. But that's water under the bridge now. mdraid> As you can see the array-size is still 20TB. mdraid> Just one second after starting the reshape operation mdraid> XFS failed with the following messages: I *think* the mdadm --grow did something, or XFS noticed the change in array size and grew on it's own. Can you provide the output of 'xfs_info /dev/md5' for us? mdraid> # dmesg mdraid> ... mdraid> RAID10 conf printout: mdraid> --- wd:21 rd:21 mdraid> disk 0, wo:0, o:1, dev:sdb mdraid> disk 1, wo:0, o:1, dev:sdt mdraid> disk 2, wo:0, o:1, dev:sde mdraid> disk 3, wo:0, o:1, dev:sdw mdraid> disk 4, wo:0, o:1, dev:sdh mdraid> disk 5, wo:0, o:1, dev:sdz mdraid> disk 6, wo:0, o:1, dev:sdk mdraid> disk 7, wo:0, o:1, dev:sdac mdraid> disk 8, wo:0, o:1, dev:sdn mdraid> disk 9, wo:0, o:1, dev:sdaf mdraid> disk 10, wo:0, o:1, dev:sdq mdraid> disk 11, wo:0, o:1, dev:sdai mdraid> disk 12, wo:0, o:1, dev:sdc mdraid> disk 13, wo:0, o:1, dev:sdu mdraid> disk 14, wo:0, o:1, dev:sdf mdraid> disk 15, wo:0, o:1, dev:sdx mdraid> disk 16, wo:0, o:1, dev:sdi mdraid> disk 17, wo:0, o:1, dev:sdaa mdraid> disk 18, wo:0, o:1, dev:sdl mdraid> disk 19, wo:0, o:1, dev:sdad mdraid> disk 20, wo:1, o:1, dev:sdo mdraid> md: reshape of RAID array md5 mdraid> md: minimum _guaranteed_ speed: 1000 KB/sec/disk. mdraid> md: using maximum available idle IO bandwidth (but not more than 200000 mdraid> KB/sec) for reshape. mdraid> md: using 128k window, over a total of 19533829120k. mdraid> XFS (md5): metadata I/O error: block 0x12c08f360 mdraid> ("xfs_trans_read_buf_map") error 5 numblks 16 mdraid> XFS (md5): xfs_imap_to_bp: xfs_trans_read_buf() returned error 5. mdraid> XFS (md5): metadata I/O error: block 0x12c08f360 mdraid> ("xfs_trans_read_buf_map") error 5 numblks 16 mdraid> XFS (md5): xfs_imap_to_bp: xfs_trans_read_buf() returned error 5. mdraid> XFS (md5): metadata I/O error: block 0xebb62c00 mdraid> ("xfs_trans_read_buf_map") error 5 numblks 16 mdraid> XFS (md5): xfs_imap_to_bp: xfs_trans_read_buf() returned error 5. mdraid> ... mdraid> ... lots of the above messages deleted mdraid> ... mdraid> XFS (md5): xfs_do_force_shutdown(0x1) called from line 138 of file mdraid> fs/xfs/xfs_bmap_util.c. Return address = 0xffffffff8113908f mdraid> XFS (md5): metadata I/O error: block 0x48c710b00 ("xlog_iodone") error 5 mdraid> numblks 64 mdraid> XFS (md5): xfs_do_force_shutdown(0x2) called from line 1170 of file mdraid> fs/xfs/xfs_log.c. Return address = 0xffffffff8117cdf4 mdraid> XFS (md5): Log I/O Error Detected. Shutting down filesystem mdraid> XFS (md5): Please umount the filesystem and rectify the problem(s) mdraid> XFS (md5): metadata I/O error: block 0x48c710b40 ("xlog_iodone") error 5 mdraid> numblks 64 mdraid> XFS (md5): xfs_do_force_shutdown(0x2) called from line 1170 of file mdraid> fs/xfs/xfs_log.c. Return address = 0xffffffff8117cdf4 mdraid> XFS (md5): metadata I/O error: block 0x48c710b80 ("xlog_iodone") error 5 mdraid> numblks 64 mdraid> XFS (md5): xfs_do_force_shutdown(0x2) called from line 1170 of file mdraid> fs/xfs/xfs_log.c. Return address = 0xffffffff8117cdf4 mdraid> XFS (md5): metadata I/O error: block 0x48c710bc0 ("xlog_iodone") error 5 mdraid> numblks 64 mdraid> XFS (md5): xfs_do_force_shutdown(0x2) called from line 1170 of file mdraid> fs/xfs/xfs_log.c. Return address = 0xffffffff8117cdf4 mdraid> XFS (md5): metadata I/O error: block 0x48c710c00 ("xlog_iodone") error 5 mdraid> numblks 64 mdraid> XFS (md5): xfs_do_force_shutdown(0x2) called from line 1170 of file mdraid> fs/xfs/xfs_log.c. Return address = 0xffffffff8117cdf4 mdraid> XFS (md5): metadata I/O error: block 0x48c710c40 ("xlog_iodone") error 5 mdraid> numblks 64 mdraid> XFS (md5): xfs_do_force_shutdown(0x2) called from line 1170 of file mdraid> fs/xfs/xfs_log.c. Return address = 0xffffffff8117cdf4 mdraid> XFS (md5): metadata I/O error: block 0x48c710c80 ("xlog_iodone") error 5 mdraid> numblks 64 mdraid> XFS (md5): xfs_do_force_shutdown(0x2) called from line 1170 of file mdraid> fs/xfs/xfs_log.c. Return address = 0xffffffff8117cdf4 mdraid> XFS (md5): metadata I/O error: block 0x48c710cc0 ("xlog_iodone") error 5 mdraid> numblks 64 mdraid> XFS (md5): xfs_do_force_shutdown(0x2) called from line 1170 of file mdraid> fs/xfs/xfs_log.c. Return address = 0xffffffff8117cdf4 mdraid> XFS (md5): I/O Error Detected. Shutting down filesystem mdraid> I did an "umount /dev/md5" and now I'm wondering what my options are: What does 'xfs_fsck -n /dev/md5' say? mdraid> Should I wait until the reshape has finisched? I assume yes mdraid> since stopping that operation will most likely make things mdraid> worse. Unfortunately reshaping a 20TB RAID10 to 21TB will mdraid> last about 10 hours but it's saturday and I have approx. 40 mdraid> hours to fix the problem until monday morning. Are you still having the problem? mdraid> Should I reduce array-size back to 20 disks? I don't think so. mdraid> My plans are to run xfs_check first, maybe followed by mdraid> xfs_repair and see what happens. Talk to the XFS folks first, before you do anything! mdraid> Any other suggestions? mdraid> Do you have an explanation why reshaping a RAID10 with a running mdraid> ext3 filesystem does work while a running XFS-filesystems fails during mdraid> a reshape? mdraid> How did the XFS-filesystem notice that a reshape was running? I was mdraid> sure that during the reshape operation every single block of the RAID10 mdraid> device could be read or written no matter wether it belongs to the part mdraid> of the RAID that was already reshaped or not. Obviously that's working mdraid> in theory only - or with ext3-filesystems only. mdraid> Or was i totally wrong with my assumption? mdraid> Much thanks in advance for any assistance. mdraid> Peter Koch mdraid> -- mdraid> To unsubscribe from this list: send the line "unsubscribe linux-raid" in mdraid> the body of a message to majordomo@xxxxxxxxxxxxxxx mdraid> More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html