Re: raid5 hang on get_active_stripe

Neil Brown <neilb@xxxxxxx> · Sat, 27 May 2006 09:55:54 +1000

On Friday May 26, dean@xxxxxxxxxx wrote:
> On Tue, 23 May 2006, Neil Brown wrote:
> 
> i applied them against 2.6.16.18 and two days later i got my first hang... 
> below is the stripe_cache foo.
> 
> thanks
> -dean
> 
> neemlark:~# cd /sys/block/md4/md/
> neemlark:/sys/block/md4/md# cat stripe_cache_active 
> 255
> 0 preread
> bitlist=0 delaylist=255
> neemlark:/sys/block/md4/md# cat stripe_cache_active 
> 255
> 0 preread
> bitlist=0 delaylist=255
> neemlark:/sys/block/md4/md# cat stripe_cache_active 
> 255
> 0 preread
> bitlist=0 delaylist=255

Thanks.  This narrows it down quite a bit... too much infact:  I can
now say for sure that this cannot possible happen :-)

Two things that might be helpful:
  1/ Do you have any other patches on 2.6.16.18 other than the 3 I
    sent you?  If you do I'd like to see them, just in case.
  2/ The message.gz you sent earlier with the
          echo t > /proc/sysrq-trigger
     trace in it didn't contain information about md4_raid5 - the 
     controlling thread for that array.  It must have missed out
     due to a buffer overflowing.  Next time it happens, could you
     to get this trace again and see if you can find out what
     what md4_raid5 is going.  Maybe do the 'echo t' several times.
     I think that you need a kernel recompile to make the dmesg
     buffer larger.

Thanks for your patience - this must be very frustrating for you.

NeilBrown
-
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html