Hi,
kernel: v4.18.16
mdadm: current HEAD, 5d518de
It looks like there's some issue between scrubbing and writing to
suspend_lo and suspend_hi, e.g. an inverted lock or missed wakeup etc.
I had an md lockup on a production machine when running a scrub and
raid6check on the same md device at the same time. I eventually had to
reset the box to recover.
At the point of the hang md1_raid6 was chewing up a lot of cpu but not
making any progress per /sys/block/md1/md/sync_completed), and the
raid6check was unkillable (kill -9 didn't work) and lsof showed it had
/sys/devices/virtual/block/md1/md/suspend_lo open for write.
The raid6check code writes to suspend_lo and suspend_hi in it's
lock_stripe() routine, to lock each stripe in turn as it works it's way
through the md device.
I'm able to reproduce the lockup on a debian9 single cpu kvm virtual
machine in 2-10 rounds of the reproducer below.
The reproducer prints dots at intervals on the order of a few seconds. If
the problem is hit, the dots stop coming. At that point the shell should
have suspend_lo or suspend_hi open for write, and will be unkillable.
Cheers,
Chris
----------------------------------------------------------------------
#
# Setup
#
# Create 6 x 11-dev raid6, wait for sync to finish
#
function test_setup
{
for md in md{1..6}; do
echo "creating ${md}"
for i in {1..11}; do
f=/var/tmp/${md}-vdev${i}
truncate -s 2G "${f}"
loop[$i]=$(losetup -f)
losetup "${loop[$i]}" "${f}"
done
mdadm --create "/dev/${md}" --level=6 --raid-disks=11 "${loop[@]}"
done
while grep resync /proc/mdstat; do sleep 2; done
cat /proc/mdstat
}
#
# Reproducer
#
# Continuous scrub of all mds, and lock successive stripes of md1 per
# raid6check:lock_stripe()
#
function test_run
{
declare -i component_size=$(($(</sys/block/md1/md/component_size) * 1024)) # KB to bytes
declare -i chunk_size=$(</sys/block/md1/md/chunk_size)
declare -i stripes=$((component_size / chunk_size))
declare -i data_disks=$(($(</sys/block/md1/md/raid_disks) - 2))
declare -i i=0 j stripe
while : ; do
i=$((i + 1))
date +"%F-%T Round $i"
#
# Start scrub on all mds
#
for md in md{1..6}; do
echo check > "/sys/block/${md}/md/sync_action"
done
sleep 2
#
# keep writing to md1 suspend_{lo,hi} as raid6check does
#
j=0
while grep -q check /proc/mdstat; do
j=$((j + 1))
echo -e " $j \c"
stripe=0
while [[ stripe -le stripes ]] ; do
[[ $((stripe % 10)) -eq 0 ]] && echo -e '.\c'
echo $((stripe * chunk_size * data_disks)) > /sys/devices/virtual/block/md1/md/suspend_lo
echo $(((stripe + 1) * chunk_size * data_disks)) > /sys/devices/virtual/block/md1/md/suspend_hi
sleep 0.2
stripe+=1
done
echo
done
done
}
----------------------------------------------------------------------