unexpected 'mdadm -S' hang with I/O pressure testing

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Unexpected Behavior:
- With Linux v5.9-rc4 mainline kernel and latest mdadm upstream code
- After running fio with 10 jobs, 16 iodpes and 64K block size for a
while, try to stop the fio process by 'Ctrl + c', the main fio process
hangs.
- Then try to stop the md raid 5 array by 'mdadm -S /dev/md0', the mdad
process hangs.
- Reboot the system by 'echo b > /proc/sysrq-trigger', this md raid5
array is assembled but inactive. /proc/mdstat shows,
	Personalities : [raid6] [raid5] [raid4]
	md127 : inactive sdc[0] sde[3] sdd[1]
	      35156259840 blocks super 1.2

Expectation:
- The fio process can stop with 'Ctrl + c'
- The raid5 array can be stopped by 'mdadm -S /dev/md0'
- This md raid5 array may continue to work (resync and being active)
after reboot


How to reproduce:
1) Create md raid5 with 3 hard drives (12TB for each SATA spinning disk)
  # mdadm -C /dev/md0 -l 5 -n 3 /dev/sd{c,d,e}
  # cat /proc/mdstat
Personalities : [raid6] [raid5] [raid4]
md0 : active raid5 sde[3] sdd[1] sdc[0]
      23437506560 blocks super 1.2 level 5, 512k chunk, algorithm 2
[3/2] [UU_]
      [>....................]  recovery =  0.0% (2556792/11718753280)
finish=5765844.7min speed=33K/sec
      bitmap: 2/88 pages [8KB], 65536KB chunk

2) Run fio for random write on the raid5 array
  fio job file content:
[global]
thread=1
ioengine=libaio
random_generator=tausworthe64

[job]
filename=/dev/md0
readwrite=randwrite
blocksize=64K
numjobs=10
iodepth=16
runtime=1m
  # fio ./raid5.fio

3) Wait for 10 seconds after the above fio runs, then type 'Ctrl + c' to
stop the fio process:
x:/home/colyli/fio_test/raid5 # fio ./raid5.fio
job: (g=0): rw=randwrite, bs=(R) 64.0KiB-64.0KiB, (W) 64.0KiB-64.0KiB,
(T) 64.0KiB-64.0KiB, ioengine=libaio, iodepth=16
...
fio-3.23-10-ge007
Starting 12 threads
^Cbs: 12 (f=12): [w(12)][3.3%][w=6080KiB/s][w=95 IOPS][eta 14m:30s]
fio: terminating on signal 2
^C
fio: terminating on signal 2
^C
fio: terminating on signal 2
Jobs: 11 (f=11): [w(5),_(1),w(4),f(1),w(1)][7.5%][eta 14m:20s]
^C
fio: terminating on signal 2
Jobs: 11 (f=11): [w(5),_(1),w(4),f(1),w(1)][70.5%][eta 15m:00s]

Now the fio process is hang forever.

4) try to stop this md raid5 array by mdadm
  # mdadm -S /dev/md0
  Now the mdadm process hangs for ever


Kernel versions to reproduce
- Use latest upstream mdadm source code
- I tried Linux v5.9-rc4, and Linux v4.12, both of them may stable
reproduce the above unexpected behavior.
  Therefore I assume maybe at least from v4.12 to v5.9 may have such issue.

Just for your information, hope you may have a look into it. Thanks in
advance.

Coly Li




[Index of Archives]     [Linux RAID Wiki]     [ATA RAID]     [Linux SCSI Target Infrastructure]     [Linux Block]     [Linux IDE]     [Linux SCSI]     [Linux Hams]     [Device Mapper]     [Device Mapper Cryptographics]     [Kernel]     [Linux Admin]     [Linux Net]     [GFS]     [RPM]     [git]     [Yosemite Forum]


  Powered by Linux