> On Sep 5, 2019, at 9:09 AM, Guoqing Jiang <guoqing.jiang@xxxxxxxxxxxxxxx> wrote: > > Replace STRIPE_SYNCING with STRIPE_ACTIVE in the subject ... > > On 8/30/19 4:07 PM, Guoqing Jiang wrote: >> >> >> On 8/30/19 1:26 AM, Song Liu wrote: >>> >>>> On Aug 29, 2019, at 8:00 AM, Guoqing Jiang <guoqing.jiang@xxxxxxxxxxxxxxx> wrote: >>>> >>>> Hi Song, >>>> >>>> On 8/29/19 7:42 AM, Song Liu wrote: >>>>> I read the code again, and now I am not sure whether we are fixing the issue. >>>>> This WARN_ONCE() does not run for head_sh, which should have STRIPE_ACTIVE. >>>>> It only runs on other stripes in the batch, which should not have STRIPE_ACTIVE. >>>> From the original commit which introduced batch write, it has the description >>>> which I think is align with your above sentence. >>>> >>>> "With below patch, we will automatically batch adjacent full stripe write >>>> together. Such stripes will be added to the batch list. Only the first stripe >>>> of the list will be put to handle_list and so run handle_stripe().". >>>> >>>> Could you point me related code which achieve the above purpose? Thanks. >>> Do you mean which code makes sure the batched stripe will not be handled? >>> This is done via properly maintain STRIPE_HANDLE bit. >> > > The raid5_make_request always set STRIPE_HANDLE for stripe which is returned from > raid5_get_active_stripe, so seems the batched stripe could also set with STRIPE_HANDLE > too, am I miss something? > > I checked the STRIPE_IO_STARTED flag, it is set in the path: handle_stripe -> ops_run_io. > Since both STRIPE_IO_STARTED and STRIPE_ACTIVE are valid, which means the stripe > should be handling in the middle of handle_stripe. > > Maybe the scenario is that raid5_make_request get a lone stripe, then it is added to handle_list > since STRIPE_HANDLE is valid, then handle_stripe sets STRIPE_ACTIVE flag to the stripe. > > Meantime, another full write writes to the same stripe, and added stripe to batch_list by: > raid5_make_request -> add_stripe_bio -> stripe_add_to_batch_list. > > Then the warning is triggered when err happens in the batch head stripe. There are two > ways to address the warning I think. > > 1. remove the checking of STRIPE_ACTIVE flag since it is possible that a batched stripe > could have the flag. > 2. if STRIPE_ACTIVE flag is valid, then don't add stripe to batch list. I think we should not set STRIPE_HANDLE if the stripe is added to a batch, so something like: diff --git i/drivers/md/raid5.c w/drivers/md/raid5.c index 3de4e13bde98..af19f08627ce 100644 --- i/drivers/md/raid5.c +++ w/drivers/md/raid5.c @@ -3308,8 +3308,11 @@ static int add_stripe_bio(struct stripe_head *sh, struct bio *bi, int dd_idx, } spin_unlock_irq(&sh->stripe_lock); - if (stripe_can_batch(sh)) + if (stripe_can_batch(sh)) { stripe_add_to_batch_list(conf, sh); + if (sh->batch_head) + return 0; + } return 1; overlap: I haven't tested this yet. It could be totally wrong. > > Also, there is short window between set STRIPE_ACTIVE and clear the flag in handle_stripe, > does it make sense to make rough change like this? > > static void handle_stripe(struct stripe_head *sh) > { > struct stripe_head_state s; > @@ -4675,7 +4682,7 @@ static void handle_stripe(struct stripe_head *sh) > struct r5dev *pdev, *qdev; > > clear_bit(STRIPE_HANDLE, &sh->state); > - if (test_and_set_bit_lock(STRIPE_ACTIVE, &sh->state)) { > + if (test_bit(STRIPE_ACTIVE, &sh->state)) { > /* already being handled, ensure it gets handled > * again when current action finishes */ > set_bit(STRIPE_HANDLE, &sh->state); > @@ -4683,9 +4690,9 @@ static void handle_stripe(struct stripe_head *sh) > } > > if (clear_batch_ready(sh) ) { > - clear_bit_unlock(STRIPE_ACTIVE, &sh->state); > return; > } > + set_bit(STRIPE_ACTIVE, &sh->state); I don't think we can do this. We need atomic test_and_set for the stripe. Thanks, Song