Lockup: 4.18, raid6 scrub vs raid6check

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi,

kernel: v4.18.16
mdadm: current HEAD, 5d518de

It looks like there's some issue between scrubbing and writing to suspend_lo and suspend_hi, e.g. an inverted lock or missed wakeup etc.

I had an md lockup on a production machine when running a scrub and raid6check on the same md device at the same time. I eventually had to reset the box to recover.

At the point of the hang md1_raid6 was chewing up a lot of cpu but not making any progress per /sys/block/md1/md/sync_completed), and the raid6check was unkillable (kill -9 didn't work) and lsof showed it had /sys/devices/virtual/block/md1/md/suspend_lo open for write.

The raid6check code writes to suspend_lo and suspend_hi in it's lock_stripe() routine, to lock each stripe in turn as it works it's way through the md device.

I'm able to reproduce the lockup on a debian9 single cpu kvm virtual machine in 2-10 rounds of the reproducer below.

The reproducer prints dots at intervals on the order of a few seconds. If the problem is hit, the dots stop coming. At that point the shell should have suspend_lo or suspend_hi open for write, and will be unkillable.

Cheers,

Chris

----------------------------------------------------------------------
#
# Setup
#
# Create 6 x 11-dev raid6, wait for sync to finish
#
function test_setup
{
 for md in md{1..6}; do
   echo "creating ${md}"
   for i in {1..11}; do
     f=/var/tmp/${md}-vdev${i}
     truncate -s 2G "${f}"
     loop[$i]=$(losetup -f)
     losetup "${loop[$i]}" "${f}"
   done
   mdadm --create "/dev/${md}" --level=6 --raid-disks=11 "${loop[@]}"
 done
 while grep resync /proc/mdstat; do sleep 2; done
 cat /proc/mdstat
}

#
# Reproducer
#
# Continuous scrub of all mds, and lock successive stripes of md1 per # raid6check:lock_stripe()
#
function test_run
{
 declare -i component_size=$(($(</sys/block/md1/md/component_size) * 1024)) # KB to bytes
 declare -i chunk_size=$(</sys/block/md1/md/chunk_size)
 declare -i stripes=$((component_size / chunk_size))
 declare -i data_disks=$(($(</sys/block/md1/md/raid_disks) - 2))
 declare -i i=0 j stripe

 while : ; do
   i=$((i + 1))
   date +"%F-%T Round $i"

   #
   # Start scrub on all mds
   #
   for md in md{1..6}; do
     echo check > "/sys/block/${md}/md/sync_action"
   done
   sleep 2

   #
   # keep writing to md1 suspend_{lo,hi} as raid6check does
   #
   j=0
   while grep -q check /proc/mdstat; do
     j=$((j + 1))
     echo  -e "  $j \c"
     stripe=0
     while [[ stripe -le stripes ]] ; do
	[[ $((stripe % 10)) -eq 0 ]] && echo -e '.\c'
	echo $((stripe * chunk_size * data_disks)) > /sys/devices/virtual/block/md1/md/suspend_lo
	echo $(((stripe + 1) * chunk_size * data_disks)) > /sys/devices/virtual/block/md1/md/suspend_hi
	sleep 0.2
	stripe+=1
     done
     echo
   done
 done
}
----------------------------------------------------------------------



[Index of Archives]     [Linux RAID Wiki]     [ATA RAID]     [Linux SCSI Target Infrastructure]     [Linux Block]     [Linux IDE]     [Linux SCSI]     [Linux Hams]     [Device Mapper]     [Device Mapper Cryptographics]     [Kernel]     [Linux Admin]     [Linux Net]     [GFS]     [RPM]     [git]     [Yosemite Forum]


  Powered by Linux