On Fri, 22 May 2015 16:44:02 -0700 Shaohua Li <shli@xxxxxxxxxx> wrote: > On Fri, May 22, 2015 at 03:30:58PM +1000, NeilBrown wrote: > > If a stripe is a member of a batch, but not the head, it must > > not be handled separately from the rest of the batch. > > > > 'clear_batch_ready()' handles this requirement to some > > extent but not completely. If a member is passed to handle_stripe() > > a second time it returns '0' indicating the stripe can be handled, > > which is wrong. > > So add an extra test. > > > > Signed-off-by: NeilBrown <neilb@xxxxxxx> > > --- > > drivers/md/raid5.c | 6 +++++- > > 1 file changed, 5 insertions(+), 1 deletion(-) > > > > diff --git a/drivers/md/raid5.c b/drivers/md/raid5.c > > index c3ccefbd4fe7..9a803b735848 100644 > > --- a/drivers/md/raid5.c > > +++ b/drivers/md/raid5.c > > @@ -4192,9 +4192,13 @@ static void analyse_stripe(struct stripe_head *sh, struct stripe_head_state *s) > > > > static int clear_batch_ready(struct stripe_head *sh) > > { > > + /* Return '1' if this is a member of batch, or > > + * '0' if it is a lone stripe or a head which can now be > > + * handled. > > + */ > > struct stripe_head *tmp; > > if (!test_and_clear_bit(STRIPE_BATCH_READY, &sh->state)) > > - return 0; > > + return (sh->batch_head && sh->batch_head != sh); > > spin_lock(&sh->stripe_lock); > > if (!sh->batch_head) { > > spin_unlock(&sh->stripe_lock); > > which case can this happen in? It definitely happens as I had reliable problems until I added this fix. 'retry_aligned_read()' can call handle_stripe() on any stripe at any time, but I doubt that would apply. I might try putting a warn-on there and see if it provides any hints. > > Patches look good. But I'm not in Fusionio any more, so can't check the > performance in big raid array with fast flash cards. I'm doing some tests here. > I hit a warning in break_stripe_batch_list, STRIPE_BIT_DELAY is set in the > stripe state. I'm checking the reason, but if you have thoughts I can try > immediately, please let me know. I got STRIPE_BIT_DELAY a few times. That was the main reason for md/raid5: ensure whole batch is delayed for all required bitmap updates. and they went away after I got that patch right. Maybe there is a race in there.. If you can reproduce it, maybe WARN whenever STRIPE_BIT_DELAY gets set on a stripe with ->batch_head. > > Thanks, > Shaohua Thanks, NeilBrown
Attachment:
pgp0B2DZZhqxj.pgp
Description: OpenPGP digital signature