Re: Growing mdadm RAID5 to RAID6 and simultaneously adding space makes data inaccessible during grow

Matt Bader <dapmk@xxxxxxx> · Fri, 16 Feb 2024 02:14:25 +0100

Hey RAID folks,

For reference, the RAID5 to RAID6 process which added a new drive at
the same time completed its reshape. I tried to add capacity and grow
a RAID5 to RAID6 in one go, adding two disks at the same time.

At the end of that lengthy initial reshape phase, the data in the RAID
became accessible again. In a second run, it cleared up its degraded
state with data being accessible throughout. No data was ultimately
lost, although service was denied for a few days while the reshape was
taking place.

I thought this might be useful for others who end up in the same situation.

Thanks,

Matt

>
> Hello RAID folks -
>
> I took a stab at growing a four-drive RAID5 to RAID6 and at the same
> time adding another drive on mdadm 4.2, by issuing
>
> $ sudo mdadm --grow --raid-devices=6 --level=6
> --backup-file=/grow_md0.bak /dev/md0
>
> Before that, two spare drives had been added to md0. All seemed to go
> well, it passed the critical section and no errors were shown. After a
> while, mdstat looked like this:
>
> $ cat /proc/mdstat
> Personalities : [linear] [multipath] [raid0] [raid1] [raid6] [raid5]
> [raid4] [raid10]
> md0 : active raid6 sdc[0] sdg[5] sdh[6] sdd[4] sdb[3] sde[1]
>       52734587904 blocks super 1.2 level 6, 512k chunk, algorithm 18
> [6/5] [UUUU_U]
>       [>....................]  reshape =  0.1% (17689088/17578195968)
> finish=3749331.8min speed=77K/sec
>       bitmap: 0/262 pages [0KB], 32768KB chunk, file:
> /bitmapfile-ext-backups-md0
>
> (By this time, I had manually throttled the reshape speed)
>
> Access to the filesystem which was mounted from /dev/md0, however,
> froze right after issuing the grow command.
>
> Reading before the reshape position (just about 69GB into the array)
> works well, but reads past that point block indefinitely and the
> syslog shows messages like this one:
>
> kernel: [ 1451.122942] INFO: task (udev-worker):2934 blocked for more
> than 1087 seconds.
> kernel: [ 1451.123010]       Tainted: P           O
> 6.5.0-14-generic #14-Ubuntu
> kernel: [ 1451.123053] "echo 0 >
> /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> kernel: [ 1451.123096] task:(udev-worker)   state:D stack:0
> pid:2934  ppid:535    flags:0x00004006
> kernel: [ 1451.123112] Call Trace:
> kernel: [ 1451.123118]  <TASK>
> kernel: [ 1451.123128]  __schedule+0x2cc/0x770
> kernel: [ 1451.123154]  schedule+0x63/0x110
> kernel: [ 1451.123166]  schedule_timeout+0x157/0x170
> kernel: [ 1451.123181]  wait_woken+0x5f/0x70
> kernel: [ 1451.123196]  raid5_make_request+0x225/0x450 [raid456]
> kernel: [ 1451.123240]  ? __pfx_woken_wake_function+0x10/0x10
> kernel: [ 1451.123257]  md_handle_request+0x139/0x220
> kernel: [ 1451.123272]  md_submit_bio+0x63/0xb0
> kernel: [ 1451.123281]  __submit_bio+0xe4/0x1c0
> kernel: [ 1451.123292]  __submit_bio_noacct+0x90/0x230
> kernel: [ 1451.123304]  submit_bio_noacct_nocheck+0x1ac/0x1f0
> kernel: [ 1451.123318]  submit_bio_noacct+0x17f/0x5e0
> kernel: [ 1451.123329]  submit_bio+0x4d/0x80
> kernel: [ 1451.123337]  submit_bh_wbc+0x124/0x150
> kernel: [ 1451.123350]  block_read_full_folio+0x33a/0x450
> kernel: [ 1451.123363]  ? __pfx_blkdev_get_block+0x10/0x10
> kernel: [ 1451.123379]  ? __pfx_blkdev_read_folio+0x10/0x10
> kernel: [ 1451.123391]  blkdev_read_folio+0x18/0x30
> kernel: [ 1451.123401]  filemap_read_folio+0x42/0xf0
> kernel: [ 1451.123416]  filemap_update_page+0x1b7/0x280
> kernel: [ 1451.123431]  filemap_get_pages+0x24f/0x3b0
> kernel: [ 1451.123450]  filemap_read+0xe4/0x420
> kernel: [ 1451.123463]  ? filemap_read+0x3d5/0x420
> kernel: [ 1451.123484]  blkdev_read_iter+0x6d/0x160
> kernel: [ 1451.123497]  vfs_read+0x20a/0x360
> kernel: [ 1451.123517]  ksys_read+0x73/0x100
> kernel: [ 1451.123531]  __x64_sys_read+0x19/0x30
> kernel: [ 1451.123543]  do_syscall_64+0x59/0x90
> kernel: [ 1451.123550]  ? do_syscall_64+0x68/0x90
> kernel: [ 1451.123556]  ? syscall_exit_to_user_mode+0x37/0x60
> kernel: [ 1451.123567]  ? do_syscall_64+0x68/0x90
> kernel: [ 1451.123574]  ? syscall_exit_to_user_mode+0x37/0x60
> kernel: [ 1451.123583]  ? do_syscall_64+0x68/0x90
> kernel: [ 1451.123589]  ? syscall_exit_to_user_mode+0x37/0x60
> kernel: [ 1451.123597]  ? do_syscall_64+0x68/0x90
> kernel: [ 1451.123603]  ? do_user_addr_fault+0x17a/0x6b0
> kernel: [ 1451.123612]  ? exit_to_user_mode_prepare+0x30/0xb0
> kernel: [ 1451.123626]  ? irqentry_exit_to_user_mode+0x17/0x20
> kernel: [ 1451.123635]  ? irqentry_exit+0x43/0x50
> kernel: [ 1451.123643]  ? exc_page_fault+0x94/0x1b0
> kernel: [ 1451.123652]  entry_SYSCALL_64_after_hwframe+0x6e/0xd8
> kernel: [ 1451.123663] RIP: 0033:0x7f89e931a721
> kernel: [ 1451.123713] RSP: 002b:00007fff8641dc48 EFLAGS: 00000246
> ORIG_RAX: 0000000000000000
> kernel: [ 1451.123723] RAX: ffffffffffffffda RBX: 0000559b1ebd94a0
> RCX: 00007f89e931a721
> kernel: [ 1451.123729] RDX: 0000000000000040 RSI: 0000559b1ebf2418
> RDI: 000000000000000d
> kernel: [ 1451.123735] RBP: 0000311ce7cf0000 R08: fffffffffffffe18
> R09: 0000000000000070
> kernel: [ 1451.123741] R10: 0000559b1ebf2810 R11: 0000000000000246
> R12: 0000559b1ebf23f0
> kernel: [ 1451.123747] R13: 0000000000000040 R14: 0000559b1ebd94f8
> R15: 0000559b1ebf2408
> kernel: [ 1451.123762]  </TASK>
>
> Reads from just before the reshape position go fast at first, then
> progress at about the speed of the reshape times four. I verified that
> the first two btrfs superblock copies on the partition (at the start
> of the drive and at 64MB) are readable and intact. The last one, at
> 256GB, is still past the reshape position and inaccessible.
>
> Rebooting and re-assembling the array led to exactly the same
> situation: The reshape is running and the beginning of the array is
> readable. Reads after the reshape point time out or block
> indefinitely.
>
> The array contains data that will be difficult or impossible to
> recover otherwise, so I would like not to lose the array's contents,
> but accessing the data during this operation would also be really
> useful. Is there a way to stop the reshape and revert the array to a
> 3+1 drive RAID5 to restore access to my data before a lengthy reshape
> runs its course?
>
> Thanks.
>
> Matt