Re: [PATCH 0/4] RFC: attempt to remove md deadlocks with metadata without

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 




----- Original Message -----
> From: "NeilBrown" <neilb@xxxxxxxx>
> To: "Xiao Ni" <xni@xxxxxxxxxx>
> Cc: linux-raid@xxxxxxxxxxxxxxx
> Sent: Thursday, September 14, 2017 7:05:20 AM
> Subject: Re: [PATCH 0/4] RFC: attempt to remove md deadlocks with metadata without
> 
> On Wed, Sep 13 2017, Xiao Ni wrote:
> >
> > Hi Neil
> >
> > Sorry for the bad news. The test is still running and it's stuck again.
> 
> Any details?  Anything at all?  Just a little hint maybe?
> 
> Just saying "it's stuck again" is very nearly useless.
> 
Hi Neil

It doesn't show any useful information in /var/log/messages

echo file raid5.c +p > /sys/kernel/debug/dynamic_debug/control
There aren't any messages too. 

It looks like another problem. 

[root@dell-pr1700-02 ~]# ps auxf | grep D
USER       PID %CPU %MEM    VSZ   RSS TTY      STAT START   TIME COMMAND
root      8381  0.0  0.0      0     0 ?        D    Sep13   0:00  \_ [kworker/u8:1]
root      8966  0.0  0.0      0     0 ?        D    Sep13   0:00  \_ [jbd2/md0-8]
root       824  0.0  0.1 216856  8492 ?        Ss   Sep03   0:06 /usr/bin/abrt-watch-log -F BUG: WARNING: at WARNING: CPU: INFO: possible recursive locking detected ernel BUG at list_del corruption list_add corruption do_IRQ: stack overflow: ear stack overflow (cur: eneral protection fault nable to handle kernel ouble fault: RTNL: assertion failed eek! page_mapcount(page) went negative! adness at NETDEV WATCHDOG ysctl table check failed : nobody cared IRQ handler type mismatch Machine Check Exception: Machine check events logged divide error: bounds: coprocessor segment overrun: invalid TSS: segment not present: invalid opcode: alignment check: stack segment: fpu exception: simd exception: iret exception: /var/log/messages -- /usr/bin/abrt-dump-oops -xtD
root       836  0.0  0.0 195052  3200 ?        Ssl  Sep03   0:00 /usr/sbin/gssproxy -D
root      1225  0.0  0.0 106008  7436 ?        Ss   Sep03   0:00 /usr/sbin/sshd -D
root     12411  0.0  0.0 112672  2264 pts/0    S+   00:50   0:00          \_ grep --color=auto D
root      8987  0.0  0.0 109000  2728 pts/2    D+   Sep13   0:04          \_ dd if=/dev/urandom of=/mnt/md_test/testfile bs=1M count=1000
root      8983  0.0  0.0   7116  2080 ?        Ds   Sep13   0:00 /usr/sbin/mdadm --grow --continue /dev/md0

[root@dell-pr1700-02 ~]# cat /proc/mdstat 
Personalities : [raid6] [raid5] [raid4] 
md0 : active raid5 loop6[7] loop4[6] loop5[5](S) loop3[3] loop2[2] loop1[1] loop0[0]
      2039808 blocks super 1.2 level 5, 512k chunk, algorithm 2 [6/6] [UUUUUU]
      [>....................]  reshape =  0.0% (1/509952) finish=1059.5min speed=7K/sec
      
unused devices: <none>


It looks like the reshape doesn't start. This time I didn't add the codes to check
the information of mddev->suspended and active_stripes. I just added the patches 
to source codes. Do you have other suggestions to check more things?

Best Regards
Xiao
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html



[Index of Archives]     [Linux RAID Wiki]     [ATA RAID]     [Linux SCSI Target Infrastructure]     [Linux Block]     [Linux IDE]     [Linux SCSI]     [Linux Hams]     [Device Mapper]     [Device Mapper Cryptographics]     [Kernel]     [Linux Admin]     [Linux Net]     [GFS]     [RPM]     [git]     [Yosemite Forum]


  Powered by Linux