Re: Lockup: 4.18, raid6 scrub vs raid6check

Jack Wang <jack.wang.usish@xxxxxxxxx> · Thu, 25 Oct 2018 15:47:39 +0200

Chris Dunlop <chris@xxxxxxxxxxxx> 于2018年10月25日周四 下午2:52写道：
>
> Hi,
>
> kernel: v4.18.16
> mdadm: current HEAD, 5d518de
Hi,

Possible related to
https://git.kernel.org/pub/scm/linux/kernel/git/shli/md.git/commit/?h=for-next&id=059421e041eb461fb2b3e81c9adaec18ef03ca3c

Can you try to apply the patch to your kernel to see if it fixes your problem?

Regards,

Jack

>
> It looks like there's some issue between scrubbing and writing to
> suspend_lo and suspend_hi, e.g. an inverted lock or missed wakeup etc.
>
> I had an md lockup on a production machine when running a scrub and
> raid6check on the same md device at the same time. I eventually had to
> reset the box to recover.
>
> At the point of the hang md1_raid6 was chewing up a lot of cpu but not
> making any progress per /sys/block/md1/md/sync_completed), and the
> raid6check was unkillable (kill -9 didn't work) and lsof showed it had
> /sys/devices/virtual/block/md1/md/suspend_lo open for write.
>
> The raid6check code writes to suspend_lo and suspend_hi in it's
> lock_stripe() routine, to lock each stripe in turn as it works it's way
> through the md device.
>
> I'm able to reproduce the lockup on a debian9 single cpu kvm virtual
> machine in 2-10 rounds of the reproducer below.
>
> The reproducer prints dots at intervals on the order of a few seconds. If
> the problem is hit, the dots stop coming. At that point the shell should
> have suspend_lo or suspend_hi open for write, and will be unkillable.
>
> Cheers,
>
> Chris
>
> ----------------------------------------------------------------------
> #
> # Setup
> #
> # Create 6 x 11-dev raid6, wait for sync to finish
> #
> function test_setup
> {
>   for md in md{1..6}; do
>     echo "creating ${md}"
>     for i in {1..11}; do
>       f=/var/tmp/${md}-vdev${i}
>       truncate -s 2G "${f}"
>       loop[$i]=$(losetup -f)
>       losetup "${loop[$i]}" "${f}"
>     done
>     mdadm --create "/dev/${md}" --level=6 --raid-disks=11 "${loop[@]}"
>   done
>   while grep resync /proc/mdstat; do sleep 2; done
>   cat /proc/mdstat
> }
>
> #
> # Reproducer
> #
> # Continuous scrub of all mds, and lock successive stripes of md1 per
> # raid6check:lock_stripe()
> #
> function test_run
> {
>   declare -i component_size=$(($(</sys/block/md1/md/component_size) * 1024)) # KB to bytes
>   declare -i chunk_size=$(</sys/block/md1/md/chunk_size)
>   declare -i stripes=$((component_size / chunk_size))
>   declare -i data_disks=$(($(</sys/block/md1/md/raid_disks) - 2))
>   declare -i i=0 j stripe
>
>   while : ; do
>     i=$((i + 1))
>     date +"%F-%T Round $i"
>
>     #
>     # Start scrub on all mds
>     #
>     for md in md{1..6}; do
>       echo check > "/sys/block/${md}/md/sync_action"
>     done
>     sleep 2
>
>     #
>     # keep writing to md1 suspend_{lo,hi} as raid6check does
>     #
>     j=0
>     while grep -q check /proc/mdstat; do
>       j=$((j + 1))
>       echo  -e "  $j \c"
>       stripe=0
>       while [[ stripe -le stripes ]] ; do
>         [[ $((stripe % 10)) -eq 0 ]] && echo -e '.\c'
>         echo $((stripe * chunk_size * data_disks)) > /sys/devices/virtual/block/md1/md/suspend_lo
>         echo $(((stripe + 1) * chunk_size * data_disks)) > /sys/devices/virtual/block/md1/md/suspend_hi
>         sleep 0.2
>         stripe+=1
>       done
>       echo
>     done
>   done
> }
> ----------------------------------------------------------------------