Chris Dunlop <chris@xxxxxxxxxxxx> 于2018年10月26日周五 下午1:28写道: > > On Thu, Oct 25, 2018 at 03:47:39PM +0200, Jack Wang wrote: > > Chris Dunlop <chris@xxxxxxxxxxxx> 于2018年10月25日周四 下午2:52写道: > >> > >> Hi, > >> > >> kernel: v4.18.16 > >> mdadm: current HEAD, 5d518de > > Hi, > > > > Possible related to > > https://git.kernel.org/pub/scm/linux/kernel/git/shli/md.git/commit/?h=for-next&id=059421e041eb461fb2b3e81c9adaec18ef03ca3c > > > > Can you try to apply the patch to your kernel to see if it fixes your problem? > > Yep, that fixes it. Survived 50+ rounds of my reproducer which hadn't > survived 10 rounds without this patch. Hi Chris, Thanks for testing and let me know. > > Thanks, > > Chris Regards, Jack > > > > > Regards, > > > > Jack > > > >> > >> It looks like there's some issue between scrubbing and writing to > >> suspend_lo and suspend_hi, e.g. an inverted lock or missed wakeup etc. > >> > >> I had an md lockup on a production machine when running a scrub and > >> raid6check on the same md device at the same time. I eventually had to > >> reset the box to recover. > >> > >> At the point of the hang md1_raid6 was chewing up a lot of cpu but not > >> making any progress per /sys/block/md1/md/sync_completed), and the > >> raid6check was unkillable (kill -9 didn't work) and lsof showed it had > >> /sys/devices/virtual/block/md1/md/suspend_lo open for write. > >> > >> The raid6check code writes to suspend_lo and suspend_hi in it's > >> lock_stripe() routine, to lock each stripe in turn as it works it's way > >> through the md device. > >> > >> I'm able to reproduce the lockup on a debian9 single cpu kvm virtual > >> machine in 2-10 rounds of the reproducer below. > >> > >> The reproducer prints dots at intervals on the order of a few seconds. If > >> the problem is hit, the dots stop coming. At that point the shell should > >> have suspend_lo or suspend_hi open for write, and will be unkillable. > >> > >> Cheers, > >> > >> Chris > >> > >> ---------------------------------------------------------------------- > >> # > >> # Setup > >> # > >> # Create 6 x 11-dev raid6, wait for sync to finish > >> # > >> function test_setup > >> { > >> for md in md{1..6}; do > >> echo "creating ${md}" > >> for i in {1..11}; do > >> f=/var/tmp/${md}-vdev${i} > >> truncate -s 2G "${f}" > >> loop[$i]=$(losetup -f) > >> losetup "${loop[$i]}" "${f}" > >> done > >> mdadm --create "/dev/${md}" --level=6 --raid-disks=11 "${loop[@]}" > >> done > >> while grep resync /proc/mdstat; do sleep 2; done > >> cat /proc/mdstat > >> } > >> > >> # > >> # Reproducer > >> # > >> # Continuous scrub of all mds, and lock successive stripes of md1 per > >> # raid6check:lock_stripe() > >> # > >> function test_run > >> { > >> declare -i component_size=$(($(</sys/block/md1/md/component_size) * 1024)) # KB to bytes > >> declare -i chunk_size=$(</sys/block/md1/md/chunk_size) > >> declare -i stripes=$((component_size / chunk_size)) > >> declare -i data_disks=$(($(</sys/block/md1/md/raid_disks) - 2)) > >> declare -i i=0 j stripe > >> > >> while : ; do > >> i=$((i + 1)) > >> date +"%F-%T Round $i" > >> > >> # > >> # Start scrub on all mds > >> # > >> for md in md{1..6}; do > >> echo check > "/sys/block/${md}/md/sync_action" > >> done > >> sleep 2 > >> > >> # > >> # keep writing to md1 suspend_{lo,hi} as raid6check does > >> # > >> j=0 > >> while grep -q check /proc/mdstat; do > >> j=$((j + 1)) > >> echo -e " $j \c" > >> stripe=0 > >> while [[ stripe -le stripes ]] ; do > >> [[ $((stripe % 10)) -eq 0 ]] && echo -e '.\c' > >> echo $((stripe * chunk_size * data_disks)) > /sys/devices/virtual/block/md1/md/suspend_lo > >> echo $(((stripe + 1) * chunk_size * data_disks)) > /sys/devices/virtual/block/md1/md/suspend_hi > >> sleep 0.2 > >> stripe+=1 > >> done > >> echo > >> done > >> done > >> } > >> ----------------------------------------------------------------------