On Thu, 01 Mar 2012 09:46:11 +0100 Milan Broz <mbroz@xxxxxxxxxx> wrote: > On 03/01/2012 02:53 AM, NeilBrown wrote: > > On Thu, 01 Mar 2012 00:31:08 +0100 Milan Broz<mbroz@xxxxxxxxxx> wrote: > > > Are you certain it is a deadlock? No forward progress at all? > > Seems so, it was for several hours in this state without progress. > > > What is in md/stripe_cache_size? Does it change? > > > What happens if you double the number in stripe_cache_size? What if you > > double it again? > > stripe_cache_size was 256, I doubled it to 512, now > stripe_cache_active is 390 > stripe_cache size is 512 > and no progress. > > With stripe_cache size 1024 it survived few iterations of fio run, now it is > locked up again: > stripe_cache_active is 921 > stripe_cache size is 1024 > That definitely looks like something getting stuck inside RAID5. There are 390 (or 921) stripes that should be being processed but they are blocked waiting for something. I would suggest modifying the 'status' function in raid5.c to print out some details about the stripes in the stripe cache. You would need to spinlock device_lock, then walk through each chain from stripe_hashtbl and print out the 'state' and 'count' for each stripe-head and flags and various bio pointers from each dev. That might be helpful. NeilBrown
Attachment:
signature.asc
Description: PGP signature