Re: md raid5 fsync deadlock

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Thu, 01 Mar 2012 09:46:11 +0100 Milan Broz <mbroz@xxxxxxxxxx> wrote:

> On 03/01/2012 02:53 AM, NeilBrown wrote:
> > On Thu, 01 Mar 2012 00:31:08 +0100 Milan Broz<mbroz@xxxxxxxxxx>  wrote:
> 
> > Are you certain it is a deadlock?  No forward progress at all?
> 
> Seems so, it was for several hours in this state without progress.
> 
> > What is in md/stripe_cache_size?  Does it change?
> 
> > What happens if you double the number in stripe_cache_size?  What if you
> > double it again?
> 
> stripe_cache_size was 256, I doubled it to 512, now
>    stripe_cache_active is 390
>    stripe_cache size is 512
> and no progress.
> 
> With stripe_cache size 1024 it survived few iterations of fio run, now it is
> locked up again:
>    stripe_cache_active is 921
>    stripe_cache size is 1024
> 

That definitely looks like something getting stuck inside RAID5.  There are
390 (or 921) stripes that should be being processed but they are blocked
waiting for something.

I would suggest modifying the 'status' function in raid5.c to print out some
details about the stripes in the stripe cache.
You would need to spinlock device_lock, then walk through each chain from
stripe_hashtbl and print out the 'state' and 'count' for each stripe-head and
flags and various bio pointers from each dev.

That might be helpful.

NeilBrown

Attachment: signature.asc
Description: PGP signature


[Index of Archives]     [Linux RAID Wiki]     [ATA RAID]     [Linux SCSI Target Infrastructure]     [Linux Block]     [Linux IDE]     [Linux SCSI]     [Linux Hams]     [Device Mapper]     [Device Mapper Cryptographics]     [Kernel]     [Linux Admin]     [Linux Net]     [GFS]     [RPM]     [git]     [Yosemite Forum]


  Powered by Linux