[mdadm] Reshape stuck after interrupted by reboot

Fisher <fisherthepooh@xxxxxxxxxxxxxx> · Wed, 02 May 2018 22:40:19 -0400

Hi,

I was trying to reshape my 3 drives RAID5 to 4 dirves RAID6, 
and the was a power outage, 

Here's my command to restart reshape after reboot, my mdadm version is 3.3

mdadm /dev/md1 -AfR /dev/sdb3 /dev/sdc3 /dev/sdd3 /dev/sdh3 
--backup-file=/home/user/backup/reshape_backup    

but I found reshape stuck at same progress for hours. After a few 

tries and adding logs, I found mdadm stuck at select in sysfs_wait

	while (completed < max_progress && completed < wait_point) {
		/* Check that sync_action is still 'reshape' to avoid
		 * waiting forever on a dead array
		 */
		/* skip... */
		sysfs_wait(fd, NULL);
		^^^^^^^^^^^^^^^^^^^^^
		if (sysfs_fd_get_ll(fd, &completed) < 0)
			goto check_progress;
	}

and this is because during monitoring reshape mdadm got "delayed" 

at first reading sync_completed, and mdadm found new_sync_max

is different from max_progress which was still calculating from the begining, 

so the variable *frozen was set and 

"sysfs_set_num(info, NULL, "sync_max", max_progress); " would never execute 

since then. That's why mdadm would enter sysfs_wait when max_progress 

reached sync_completed and kernel still had the old value of max_progress.

why would mdadm read sync_completed just right in the transaction of curr_resync

between value 2 to value 3 every time I restarted reshape? 

But the weirdest thing is that I cannot reproduce it on the other machine with 

same version of kernel and mdadm. The only difference between these 

two environment is the physical machine, is that possible it is 

because of clock frequency or other physical reasons?

I have added some log in kernel and in mdadm

Thanks,

<kernel log>  linux-4.2.8

[   43.685260] [array_state_store] md1: entering ... 
[   43.690043] [array_state_store] md1: from active
[   43.694636] [array_state_store] md1: restart_array start from active
[   43.700986] md: md1 switched to read-write mode.
[   43.705775] [md_seq_show] md1: ===================
[   43.749593] [md_do_sync] md1: entering ....
[   43.753769] [md_do_sync] md1: last sync action = reshape
[   43.759055] [md_do_sync] md1: reshape !!
[   43.762960] [md_do_sync_0] md1:  mddev->curr_resync set to 2
[   43.772833] [md_do_sync_0] md1, mddev2 = md1 in
[   43.799217] [md_update_sb] md1: sysfs_notify sync_completed
[   43.799441] [md_seq_show] md1: ===================
[   43.799540] [sync_completed_show] md1: mddev->curr_resync = 2
[   43.799540] [sync_completed_show] md1: return delayed
							^^^^^^^
						mdadm got delayed here 
[   43.849369] [md_do_sync] md1: leave normally
[   43.849370] md: reshape of RAID array md1
[   43.849370] md: minimum _guaranteed_  speed: 200000 KB/sec/disk.
[   43.849371] md: using maximum available idle IO bandwidth (but not more than \
500000 KB/sec) for reshape. 
[   43.849371] md: Reshaping started: md1
[   43.849379] md: using 128k window, over a total of 966807552k.
[   43.849380] [md_do_sync_3] md1 : mddev->curr_resync set to 3
[   43.849391] [md_do_sync_0] md1: sysfs_notify sync_completed
[   43.849408] [reshape_request_2] before sysfs_notify sync_completed
[   43.849409] [reshape_request_2] after sysfs_notify sync_completed
[   43.849410] [md_do_sync] md1: j = 25739264, mddev->resync_max = 25739264
[   43.849410] [md_do_sync] md1: before wait_event_interruptible
[   43.849418] [sync_completed_show] md1: mddev->curr_resync_completed = 25739264, 
										resync = 25739264 
[   43.849479] [md_seq_show] md1: ================= 
[   43.849579] [sync_completed_show] md1: mddev->curr_resync_completed = 25739264, 
										resync = 25739264

<mdadm log>

[mdstat_read] mdname = md1
[mdstat_read] mdname = md1, (w = active), active
[mdstat_read] mdname = md1, 
[mdstat_read] mdname = md1, w = blocks, in_devs = 0
[mdstat_read] mdname = md1, (w = resync=DELAYED), resync
[mdstat_read] mdname = md1, (w = resync=DELAYED), RESYNC_DELAYED
[sysfs_fd_get_ll] n = 8, errno = 13, Permission denied
[sysfs_fd_get_ll] ep (delayed
) == buf (delayed
)
cannot read completed, ret1 = 1937339183, errno = 13
[progress_reshape] The abort might only be temporary, ret1 = 9, 
			ret2 = -57, buf = 51478528
[progress_reshape]  goto check_progress
[progress_reshape] all good again
[progress_reshape] new_sync_max = 25739264, max_progress = 262144
[progress_reshape] *frozen set

          == next iteration ==

[progress_reshape] frozen!!, will not set sync_max
                                           ^^^^^^^^^
                                          max_progress in kernel is different from here

--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html