Re: unexpected 'mdadm -S' hang with I/O pressure testing

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



One thing to correct: the hang is not forever - after I posted the
previous email, all commands returns and the array stopped. It takes
around 40 minutes -- still quite unexpected and suspicious.

Thanks.

Coly Li

On 2020/9/12 22:06, Coly Li wrote:
> Unexpected Behavior:
> - With Linux v5.9-rc4 mainline kernel and latest mdadm upstream code
> - After running fio with 10 jobs, 16 iodpes and 64K block size for a
> while, try to stop the fio process by 'Ctrl + c', the main fio process
> hangs.
> - Then try to stop the md raid 5 array by 'mdadm -S /dev/md0', the mdad
> process hangs.
> - Reboot the system by 'echo b > /proc/sysrq-trigger', this md raid5
> array is assembled but inactive. /proc/mdstat shows,
> 	Personalities : [raid6] [raid5] [raid4]
> 	md127 : inactive sdc[0] sde[3] sdd[1]
> 	      35156259840 blocks super 1.2
> 
> Expectation:
> - The fio process can stop with 'Ctrl + c'
> - The raid5 array can be stopped by 'mdadm -S /dev/md0'
> - This md raid5 array may continue to work (resync and being active)
> after reboot
> 
> 
> How to reproduce:
> 1) Create md raid5 with 3 hard drives (12TB for each SATA spinning disk)
>   # mdadm -C /dev/md0 -l 5 -n 3 /dev/sd{c,d,e}
>   # cat /proc/mdstat
> Personalities : [raid6] [raid5] [raid4]
> md0 : active raid5 sde[3] sdd[1] sdc[0]
>       23437506560 blocks super 1.2 level 5, 512k chunk, algorithm 2
> [3/2] [UU_]
>       [>....................]  recovery =  0.0% (2556792/11718753280)
> finish=5765844.7min speed=33K/sec
>       bitmap: 2/88 pages [8KB], 65536KB chunk
> 
> 2) Run fio for random write on the raid5 array
>   fio job file content:
> [global]
> thread=1
> ioengine=libaio
> random_generator=tausworthe64
> 
> [job]
> filename=/dev/md0
> readwrite=randwrite
> blocksize=64K
> numjobs=10
> iodepth=16
> runtime=1m
>   # fio ./raid5.fio
> 
> 3) Wait for 10 seconds after the above fio runs, then type 'Ctrl + c' to
> stop the fio process:
> x:/home/colyli/fio_test/raid5 # fio ./raid5.fio
> job: (g=0): rw=randwrite, bs=(R) 64.0KiB-64.0KiB, (W) 64.0KiB-64.0KiB,
> (T) 64.0KiB-64.0KiB, ioengine=libaio, iodepth=16
> ...
> fio-3.23-10-ge007
> Starting 12 threads
> ^Cbs: 12 (f=12): [w(12)][3.3%][w=6080KiB/s][w=95 IOPS][eta 14m:30s]
> fio: terminating on signal 2
> ^C
> fio: terminating on signal 2
> ^C
> fio: terminating on signal 2
> Jobs: 11 (f=11): [w(5),_(1),w(4),f(1),w(1)][7.5%][eta 14m:20s]
> ^C
> fio: terminating on signal 2
> Jobs: 11 (f=11): [w(5),_(1),w(4),f(1),w(1)][70.5%][eta 15m:00s]
> 
> Now the fio process is hang forever.
> 
> 4) try to stop this md raid5 array by mdadm
>   # mdadm -S /dev/md0
>   Now the mdadm process hangs for ever
> 
> 
> Kernel versions to reproduce
> - Use latest upstream mdadm source code
> - I tried Linux v5.9-rc4, and Linux v4.12, both of them may stable
> reproduce the above unexpected behavior.
>   Therefore I assume maybe at least from v4.12 to v5.9 may have such issue.
> 
> Just for your information, hope you may have a look into it. Thanks in
> advance.
> 
> Coly Li
> 




[Index of Archives]     [Linux RAID Wiki]     [ATA RAID]     [Linux SCSI Target Infrastructure]     [Linux Block]     [Linux IDE]     [Linux SCSI]     [Linux Hams]     [Device Mapper]     [Device Mapper Cryptographics]     [Kernel]     [Linux Admin]     [Linux Net]     [GFS]     [RPM]     [git]     [Yosemite Forum]


  Powered by Linux