Growing mdadm RAID5 to RAID6 and simultaneously adding space makes data inaccessible during grow

Matt Bader <dapmk@xxxxxxx> · Sun, 21 Jan 2024 08:39:33 +0100

Hello RAID folks -

I took a stab at growing a four-drive RAID5 to RAID6 and at the same
time adding another drive on mdadm 4.2, by issuing

$ sudo mdadm --grow --raid-devices=6 --level=6
--backup-file=/grow_md0.bak /dev/md0

Before that, two spare drives had been added to md0. All seemed to go
well, it passed the critical section and no errors were shown. After a
while, mdstat looked like this:

$ cat /proc/mdstat
Personalities : [linear] [multipath] [raid0] [raid1] [raid6] [raid5]
[raid4] [raid10]
md0 : active raid6 sdc[0] sdg[5] sdh[6] sdd[4] sdb[3] sde[1]
      52734587904 blocks super 1.2 level 6, 512k chunk, algorithm 18
[6/5] [UUUU_U]
      [>....................]  reshape =  0.1% (17689088/17578195968)
finish=3749331.8min speed=77K/sec
      bitmap: 0/262 pages [0KB], 32768KB chunk, file:
/bitmapfile-ext-backups-md0

(By this time, I had manually throttled the reshape speed)

Access to the filesystem which was mounted from /dev/md0, however,
froze right after issuing the grow command.

Reading before the reshape position (just about 69GB into the array)
works well, but reads past that point block indefinitely and the
syslog shows messages like this one:

kernel: [ 1451.122942] INFO: task (udev-worker):2934 blocked for more
than 1087 seconds.
kernel: [ 1451.123010]       Tainted: P           O
6.5.0-14-generic #14-Ubuntu
kernel: [ 1451.123053] "echo 0 >
/proc/sys/kernel/hung_task_timeout_secs" disables this message.
kernel: [ 1451.123096] task:(udev-worker)   state:D stack:0
pid:2934  ppid:535    flags:0x00004006
kernel: [ 1451.123112] Call Trace:
kernel: [ 1451.123118]  <TASK>
kernel: [ 1451.123128]  __schedule+0x2cc/0x770
kernel: [ 1451.123154]  schedule+0x63/0x110
kernel: [ 1451.123166]  schedule_timeout+0x157/0x170
kernel: [ 1451.123181]  wait_woken+0x5f/0x70
kernel: [ 1451.123196]  raid5_make_request+0x225/0x450 [raid456]
kernel: [ 1451.123240]  ? __pfx_woken_wake_function+0x10/0x10
kernel: [ 1451.123257]  md_handle_request+0x139/0x220
kernel: [ 1451.123272]  md_submit_bio+0x63/0xb0
kernel: [ 1451.123281]  __submit_bio+0xe4/0x1c0
kernel: [ 1451.123292]  __submit_bio_noacct+0x90/0x230
kernel: [ 1451.123304]  submit_bio_noacct_nocheck+0x1ac/0x1f0
kernel: [ 1451.123318]  submit_bio_noacct+0x17f/0x5e0
kernel: [ 1451.123329]  submit_bio+0x4d/0x80
kernel: [ 1451.123337]  submit_bh_wbc+0x124/0x150
kernel: [ 1451.123350]  block_read_full_folio+0x33a/0x450
kernel: [ 1451.123363]  ? __pfx_blkdev_get_block+0x10/0x10
kernel: [ 1451.123379]  ? __pfx_blkdev_read_folio+0x10/0x10
kernel: [ 1451.123391]  blkdev_read_folio+0x18/0x30
kernel: [ 1451.123401]  filemap_read_folio+0x42/0xf0
kernel: [ 1451.123416]  filemap_update_page+0x1b7/0x280
kernel: [ 1451.123431]  filemap_get_pages+0x24f/0x3b0
kernel: [ 1451.123450]  filemap_read+0xe4/0x420
kernel: [ 1451.123463]  ? filemap_read+0x3d5/0x420
kernel: [ 1451.123484]  blkdev_read_iter+0x6d/0x160
kernel: [ 1451.123497]  vfs_read+0x20a/0x360
kernel: [ 1451.123517]  ksys_read+0x73/0x100
kernel: [ 1451.123531]  __x64_sys_read+0x19/0x30
kernel: [ 1451.123543]  do_syscall_64+0x59/0x90
kernel: [ 1451.123550]  ? do_syscall_64+0x68/0x90
kernel: [ 1451.123556]  ? syscall_exit_to_user_mode+0x37/0x60
kernel: [ 1451.123567]  ? do_syscall_64+0x68/0x90
kernel: [ 1451.123574]  ? syscall_exit_to_user_mode+0x37/0x60
kernel: [ 1451.123583]  ? do_syscall_64+0x68/0x90
kernel: [ 1451.123589]  ? syscall_exit_to_user_mode+0x37/0x60
kernel: [ 1451.123597]  ? do_syscall_64+0x68/0x90
kernel: [ 1451.123603]  ? do_user_addr_fault+0x17a/0x6b0
kernel: [ 1451.123612]  ? exit_to_user_mode_prepare+0x30/0xb0
kernel: [ 1451.123626]  ? irqentry_exit_to_user_mode+0x17/0x20
kernel: [ 1451.123635]  ? irqentry_exit+0x43/0x50
kernel: [ 1451.123643]  ? exc_page_fault+0x94/0x1b0
kernel: [ 1451.123652]  entry_SYSCALL_64_after_hwframe+0x6e/0xd8
kernel: [ 1451.123663] RIP: 0033:0x7f89e931a721
kernel: [ 1451.123713] RSP: 002b:00007fff8641dc48 EFLAGS: 00000246
ORIG_RAX: 0000000000000000
kernel: [ 1451.123723] RAX: ffffffffffffffda RBX: 0000559b1ebd94a0
RCX: 00007f89e931a721
kernel: [ 1451.123729] RDX: 0000000000000040 RSI: 0000559b1ebf2418
RDI: 000000000000000d
kernel: [ 1451.123735] RBP: 0000311ce7cf0000 R08: fffffffffffffe18
R09: 0000000000000070
kernel: [ 1451.123741] R10: 0000559b1ebf2810 R11: 0000000000000246
R12: 0000559b1ebf23f0
kernel: [ 1451.123747] R13: 0000000000000040 R14: 0000559b1ebd94f8
R15: 0000559b1ebf2408
kernel: [ 1451.123762]  </TASK>

Reads from just before the reshape position go fast at first, then
progress at about the speed of the reshape times four. I verified that
the first two btrfs superblock copies on the partition (at the start
of the drive and at 64MB) are readable and intact. The last one, at
256GB, is still past the reshape position and inaccessible.

Rebooting and re-assembling the array led to exactly the same
situation: The reshape is running and the beginning of the array is
readable. Reads after the reshape point time out or block
indefinitely.

The array contains data that will be difficult or impossible to
recover otherwise, so I would like not to lose the array's contents,
but accessing the data during this operation would also be really
useful. Is there a way to stop the reshape and revert the array to a
3+1 drive RAID5 to restore access to my data before a lengthy reshape
runs its course?

Thanks.

Matt