Re: Lockup: 4.18, raid6 scrub vs raid6check

Jack Wang <jack.wang.usish@xxxxxxxxx> · Fri, 26 Oct 2018 14:36:09 +0200

Chris Dunlop <chris@xxxxxxxxxxxx> 于2018年10月26日周五 下午1:28写道：
>
> On Thu, Oct 25, 2018 at 03:47:39PM +0200, Jack Wang wrote:
> > Chris Dunlop <chris@xxxxxxxxxxxx> 于2018年10月25日周四 下午2:52写道：
> >>
> >> Hi,
> >>
> >> kernel: v4.18.16
> >> mdadm: current HEAD, 5d518de
> > Hi,
> >
> > Possible related to
> > https://git.kernel.org/pub/scm/linux/kernel/git/shli/md.git/commit/?h=for-next&id=059421e041eb461fb2b3e81c9adaec18ef03ca3c
> >
> > Can you try to apply the patch to your kernel to see if it fixes your problem?
>
> Yep, that fixes it. Survived 50+ rounds of my reproducer which hadn't
> survived 10 rounds without this patch.
Hi Chris,

Thanks for testing and let me know.

>
> Thanks,
>
> Chris
Regards,
Jack
>
> >
> > Regards,
> >
> > Jack
> >
> >>
> >> It looks like there's some issue between scrubbing and writing to
> >> suspend_lo and suspend_hi, e.g. an inverted lock or missed wakeup etc.
> >>
> >> I had an md lockup on a production machine when running a scrub and
> >> raid6check on the same md device at the same time. I eventually had to
> >> reset the box to recover.
> >>
> >> At the point of the hang md1_raid6 was chewing up a lot of cpu but not
> >> making any progress per /sys/block/md1/md/sync_completed), and the
> >> raid6check was unkillable (kill -9 didn't work) and lsof showed it had
> >> /sys/devices/virtual/block/md1/md/suspend_lo open for write.
> >>
> >> The raid6check code writes to suspend_lo and suspend_hi in it's
> >> lock_stripe() routine, to lock each stripe in turn as it works it's way
> >> through the md device.
> >>
> >> I'm able to reproduce the lockup on a debian9 single cpu kvm virtual
> >> machine in 2-10 rounds of the reproducer below.
> >>
> >> The reproducer prints dots at intervals on the order of a few seconds. If
> >> the problem is hit, the dots stop coming. At that point the shell should
> >> have suspend_lo or suspend_hi open for write, and will be unkillable.
> >>
> >> Cheers,
> >>
> >> Chris
> >>
> >> ----------------------------------------------------------------------
> >> #
> >> # Setup
> >> #
> >> # Create 6 x 11-dev raid6, wait for sync to finish
> >> #
> >> function test_setup
> >> {
> >>   for md in md{1..6}; do
> >>     echo "creating ${md}"
> >>     for i in {1..11}; do
> >>       f=/var/tmp/${md}-vdev${i}
> >>       truncate -s 2G "${f}"
> >>       loop[$i]=$(losetup -f)
> >>       losetup "${loop[$i]}" "${f}"
> >>     done
> >>     mdadm --create "/dev/${md}" --level=6 --raid-disks=11 "${loop[@]}"
> >>   done
> >>   while grep resync /proc/mdstat; do sleep 2; done
> >>   cat /proc/mdstat
> >> }
> >>
> >> #
> >> # Reproducer
> >> #
> >> # Continuous scrub of all mds, and lock successive stripes of md1 per
> >> # raid6check:lock_stripe()
> >> #
> >> function test_run
> >> {
> >>   declare -i component_size=$(($(</sys/block/md1/md/component_size) * 1024)) # KB to bytes
> >>   declare -i chunk_size=$(</sys/block/md1/md/chunk_size)
> >>   declare -i stripes=$((component_size / chunk_size))
> >>   declare -i data_disks=$(($(</sys/block/md1/md/raid_disks) - 2))
> >>   declare -i i=0 j stripe
> >>
> >>   while : ; do
> >>     i=$((i + 1))
> >>     date +"%F-%T Round $i"
> >>
> >>     #
> >>     # Start scrub on all mds
> >>     #
> >>     for md in md{1..6}; do
> >>       echo check > "/sys/block/${md}/md/sync_action"
> >>     done
> >>     sleep 2
> >>
> >>     #
> >>     # keep writing to md1 suspend_{lo,hi} as raid6check does
> >>     #
> >>     j=0
> >>     while grep -q check /proc/mdstat; do
> >>       j=$((j + 1))
> >>       echo  -e "  $j \c"
> >>       stripe=0
> >>       while [[ stripe -le stripes ]] ; do
> >>         [[ $((stripe % 10)) -eq 0 ]] && echo -e '.\c'
> >>         echo $((stripe * chunk_size * data_disks)) > /sys/devices/virtual/block/md1/md/suspend_lo
> >>         echo $(((stripe + 1) * chunk_size * data_disks)) > /sys/devices/virtual/block/md1/md/suspend_hi
> >>         sleep 0.2
> >>         stripe+=1
> >>       done
> >>       echo
> >>     done
> >>   done
> >> }
> >> ----------------------------------------------------------------------