Re: live lock regression in raid5 reshape

NeilBrown <neilb@xxxxxxx> · Fri, 26 Feb 2016 09:01:52 +1100

On Fri, Feb 26 2016, Shaohua Li wrote:

> Hi,
>
> I hit a live lock in reshape test, which is introduced by:
>
> e9e4c377e2f563892c50d1d093dd55c7d518fc3d(md/raid5: per hash value and exclusive wait_for_stripe)
>
> The problem is get_active_stripe waits on conf->wait_for_stripe[hash]. Assume
> hash is 0. My test release stripes in this order:
> - release all stripes with hash 0
> - get_active_stripe still sleeps since active_stripes > max_nr_stripes * 3 / 4
> - release all stripes with hash other than 0. active_stripes becomes 0
> - get_active_stripe still sleeps, since nobody wakes up wait_for_stripe[0]
>
> The system live locks. The problem is active_stripes isn't a per-hash count.
> Revert the patch makes the lock go away.
>
> I didn't come out a solution yet except reverting the patch. Making
> active_stripes per-hash is a candidate, but not sure if there is thundering
> herd problem because each hash will have less stripes. On the other hand, I'm
> wondering if the patch makes sense now. The commit log declares the issue
> happens with limited stripes, but now stripe count is automatically increased.
>

->active_stripes does seem to be the core of the problem here.

The purpose of the comparison with max_nr_stripes*3/4 was to encourage
requests to be handled in large batches rather than dribbling out one at
a time.  That should encourage the creation of full stripe writes.  I
think it does (or at least: did) help but we know it isn't perfect.
There might be a better way.

If two threads are each writing full stripes of data, we would prefer
one could allocate a full set of stripe_heads and the other one get
nothing for a little while, rather than each get half of the number of
stripe_heads that they need.

Possibly we would could impose this restriction only on the first
stripe_head in a stripe (i.e. the start of a chunk).  That should have
much the same effect but wouldn't cause the problem you are seeing.

Certainly backing this out is simplest (particularly if you want to send
it to -stable).  I suspect it would be best to ultimately keep the
hashed wait queues if we can avoid the livelock.

thanks,
NeilBrown

Attachment:
signature.asc

Description: PGP signature