Re: live lock regression in raid5 reshape

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Fri, Feb 26 2016, Shaohua Li wrote:

> Hi,
>
> I hit a live lock in reshape test, which is introduced by:
>
> e9e4c377e2f563892c50d1d093dd55c7d518fc3d(md/raid5: per hash value and exclusive wait_for_stripe)
>
> The problem is get_active_stripe waits on conf->wait_for_stripe[hash]. Assume
> hash is 0. My test release stripes in this order:
> - release all stripes with hash 0
> - get_active_stripe still sleeps since active_stripes > max_nr_stripes * 3 / 4
> - release all stripes with hash other than 0. active_stripes becomes 0
> - get_active_stripe still sleeps, since nobody wakes up wait_for_stripe[0]
>
> The system live locks. The problem is active_stripes isn't a per-hash count.
> Revert the patch makes the lock go away.
>
> I didn't come out a solution yet except reverting the patch. Making
> active_stripes per-hash is a candidate, but not sure if there is thundering
> herd problem because each hash will have less stripes. On the other hand, I'm
> wondering if the patch makes sense now. The commit log declares the issue
> happens with limited stripes, but now stripe count is automatically increased.
>

->active_stripes does seem to be the core of the problem here.

The purpose of the comparison with max_nr_stripes*3/4 was to encourage
requests to be handled in large batches rather than dribbling out one at
a time.  That should encourage the creation of full stripe writes.  I
think it does (or at least: did) help but we know it isn't perfect.
There might be a better way.

If two threads are each writing full stripes of data, we would prefer
one could allocate a full set of stripe_heads and the other one get
nothing for a little while, rather than each get half of the number of
stripe_heads that they need.

Possibly we would could impose this restriction only on the first
stripe_head in a stripe (i.e. the start of a chunk).  That should have
much the same effect but wouldn't cause the problem you are seeing.

Certainly backing this out is simplest (particularly if you want to send
it to -stable).  I suspect it would be best to ultimately keep the
hashed wait queues if we can avoid the livelock.

thanks,
NeilBrown

Attachment: signature.asc
Description: PGP signature


[Index of Archives]     [Linux RAID Wiki]     [ATA RAID]     [Linux SCSI Target Infrastructure]     [Linux Block]     [Linux IDE]     [Linux SCSI]     [Linux Hams]     [Device Mapper]     [Device Mapper Cryptographics]     [Kernel]     [Linux Admin]     [Linux Net]     [GFS]     [RPM]     [git]     [Yosemite Forum]


  Powered by Linux