----- Original Message ----- > From: "NeilBrown" <neilb@xxxxxxxx> > To: "Xiao Ni" <xni@xxxxxxxxxx> > Cc: linux-raid@xxxxxxxxxxxxxxx > Sent: Thursday, September 14, 2017 7:05:20 AM > Subject: Re: [PATCH 0/4] RFC: attempt to remove md deadlocks with metadata without > > On Wed, Sep 13 2017, Xiao Ni wrote: > > > > Hi Neil > > > > Sorry for the bad news. The test is still running and it's stuck again. > > Any details? Anything at all? Just a little hint maybe? > > Just saying "it's stuck again" is very nearly useless. > Hi Neil It doesn't show any useful information in /var/log/messages echo file raid5.c +p > /sys/kernel/debug/dynamic_debug/control There aren't any messages too. It looks like another problem. [root@dell-pr1700-02 ~]# ps auxf | grep D USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND root 8381 0.0 0.0 0 0 ? D Sep13 0:00 \_ [kworker/u8:1] root 8966 0.0 0.0 0 0 ? D Sep13 0:00 \_ [jbd2/md0-8] root 824 0.0 0.1 216856 8492 ? Ss Sep03 0:06 /usr/bin/abrt-watch-log -F BUG: WARNING: at WARNING: CPU: INFO: possible recursive locking detected ernel BUG at list_del corruption list_add corruption do_IRQ: stack overflow: ear stack overflow (cur: eneral protection fault nable to handle kernel ouble fault: RTNL: assertion failed eek! page_mapcount(page) went negative! adness at NETDEV WATCHDOG ysctl table check failed : nobody cared IRQ handler type mismatch Machine Check Exception: Machine check events logged divide error: bounds: coprocessor segment overrun: invalid TSS: segment not present: invalid opcode: alignment check: stack segment: fpu exception: simd exception: iret exception: /var/log/messages -- /usr/bin/abrt-dump-oops -xtD root 836 0.0 0.0 195052 3200 ? Ssl Sep03 0:00 /usr/sbin/gssproxy -D root 1225 0.0 0.0 106008 7436 ? Ss Sep03 0:00 /usr/sbin/sshd -D root 12411 0.0 0.0 112672 2264 pts/0 S+ 00:50 0:00 \_ grep --color=auto D root 8987 0.0 0.0 109000 2728 pts/2 D+ Sep13 0:04 \_ dd if=/dev/urandom of=/mnt/md_test/testfile bs=1M count=1000 root 8983 0.0 0.0 7116 2080 ? Ds Sep13 0:00 /usr/sbin/mdadm --grow --continue /dev/md0 [root@dell-pr1700-02 ~]# cat /proc/mdstat Personalities : [raid6] [raid5] [raid4] md0 : active raid5 loop6[7] loop4[6] loop5[5](S) loop3[3] loop2[2] loop1[1] loop0[0] 2039808 blocks super 1.2 level 5, 512k chunk, algorithm 2 [6/6] [UUUUUU] [>....................] reshape = 0.0% (1/509952) finish=1059.5min speed=7K/sec unused devices: <none> It looks like the reshape doesn't start. This time I didn't add the codes to check the information of mddev->suspended and active_stripes. I just added the patches to source codes. Do you have other suggestions to check more things? Best Regards Xiao -- To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html