On Tue, 14 Mar 2006, Neil Brown wrote: > On Monday March 13, patrik@xxxxxxxxxxx wrote: > > Hi all, > > > > I just experienced some kind of lockup accessing my 8-drive raid5 > > (2.6.16-rc4-mm2). The system has been up for 16 days running fine, but > > now processes that try to read the md device hang. ps tells me they are > > all sleeping in get_active_stripe. There is nothing in the syslog, and I > > can read from the individual drives fine with dd. mdadm says the state > > is "active". > > Hmmm... That's sad. That's going to be very hard to track down. > > If you could > echo t > /proc/sysrq-trigger > > and send me the dump that appears in the kernel log, I would > appreciate it. I doubt it will be very helpful, but it is the best > bet I can come up with. i seem to be running into this as well... it has happenned several times in the past three weeks. i attached the kernel log output... it's a debian 2.6.16 kernel, which is based mostly on 2.6.16.10. md4 : active raid5 sdd1[0] sde1[5](S) sdh1[4] sdg1[3] sdf1[2] sdc1[1] 1562834944 blocks level 5, 128k chunk, algorithm 2 [5/5] [UUUUU] bitmap: 3/187 pages [12KB], 1024KB chunk those drives are on 3w-xxxx (7508 controller). i'm using lvm2 and xfs as the filesystem (although i'm pretty sure an ext3 fs on another lv is hanging too -- but i forgot to check before i unwedged it). let me know if anything else is useful and i can try to catch it next time. > You could try increasing the size of the stripe cache > echo 512 > /sys/block/mdX/md/stripe_cache_size > (choose and appropriate 'X'). yeah that got things going again -- it took a minute or so maybe, i wasn't paying attention as to how fast things cleared up. > Maybe check the content of > /sys/block/mdX/md/stripe_cache_active > as well. next time i'll check this before i increase stripe_cache_size... it's 0 now, but the raid5 is working again... -dean
Attachment:
messages.gz
Description: Binary data