Re: [PATCH] raid5: don't warn with STRIPE_ACTIVE flag in break_stripe_batch_list

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi Song,

On 9/9/19 5:10 PM, Song Liu wrote:


On Sep 6, 2019, at 11:13 AM, Guoqing Jiang <guoqing.jiang@xxxxxxxxxxxxxxx> wrote:



On 9/5/19 7:15 PM, Song Liu wrote:

On Sep 5, 2019, at 9:09 AM, Guoqing Jiang <guoqing.jiang@xxxxxxxxxxxxxxx> wrote:

Replace STRIPE_SYNCING with STRIPE_ACTIVE in the subject ...

On 8/30/19 4:07 PM, Guoqing Jiang wrote:

On 8/30/19 1:26 AM, Song Liu wrote:
On Aug 29, 2019, at 8:00 AM, Guoqing Jiang <guoqing.jiang@xxxxxxxxxxxxxxx> wrote:

Hi Song,

On 8/29/19 7:42 AM, Song Liu wrote:
I read the code again, and now I am not sure whether we are fixing the issue.
This WARN_ONCE() does not run for head_sh, which should have STRIPE_ACTIVE.
It only runs on other stripes in the batch, which should not have STRIPE_ACTIVE.
  From the original commit which introduced batch write, it has the description
which I think is align with your above sentence.

"With below patch, we will automatically batch adjacent full stripe write
together. Such stripes will be added to the batch list. Only the first stripe
of the list will be put to handle_list and so run handle_stripe().".

Could you point me related code which achieve the above purpose? Thanks.
Do you mean which code makes sure the batched stripe will not be handled?
This is done via properly maintain STRIPE_HANDLE bit.
The raid5_make_request always set STRIPE_HANDLE for stripe which is returned from
raid5_get_active_stripe, so seems the batched stripe could also set with STRIPE_HANDLE
too, am I miss something?

I checked the STRIPE_IO_STARTED flag, it is set in the path: handle_stripe -> ops_run_io.
Since both STRIPE_IO_STARTED and STRIPE_ACTIVE are valid, which means the stripe
should be handling in the middle of handle_stripe.

Maybe the scenario is that raid5_make_request get a lone stripe, then it is added to handle_list
since STRIPE_HANDLE is valid, then handle_stripe sets STRIPE_ACTIVE flag to the stripe.

Meantime, another full write writes to the same stripe, and added stripe to batch_list by:
      raid5_make_request -> add_stripe_bio -> stripe_add_to_batch_list.

Then the warning is triggered when err happens in the batch head stripe. There are two
ways to address the warning I think.

1. remove the checking of STRIPE_ACTIVE flag since it is possible that a batched stripe
could have the flag.
2. if STRIPE_ACTIVE flag is valid, then don't add stripe to batch list.
I think we should not set STRIPE_HANDLE if the stripe is added to a batch, so something
like:

diff --git i/drivers/md/raid5.c w/drivers/md/raid5.c
index 3de4e13bde98..af19f08627ce 100644
--- i/drivers/md/raid5.c
+++ w/drivers/md/raid5.c
@@ -3308,8 +3308,11 @@ static int add_stripe_bio(struct stripe_head *sh, struct bio *bi, int dd_idx,
         }
         spin_unlock_irq(&sh->stripe_lock);

-       if (stripe_can_batch(sh))
+       if (stripe_can_batch(sh)) {
                 stripe_add_to_batch_list(conf, sh);
+               if (sh->batch_head)
+                       return 0;
+       }
         return 1;

   overlap:


I haven't tested this yet. It could be totally wrong.

The return value of add_stripe_bio should mean the bio is added to stripe or not.
With the change, '0' could also means stripe is added to batch list, which is really
confusing. Or we can check batch_head before STRIPE_HANDLE.

@@ -5718,7 +5727,8 @@ static bool raid5_make_request(struct mddev *mddev, struct bio * bi)
                                 do_flush = false;
                         }

-                       set_bit(STRIPE_HANDLE, &sh->state);
+                       if (!sh->batch_head)
+                               set_bit(STRIPE_HANDLE, &sh->state);

I think this works.

Thanks.




Also, there is short window between set STRIPE_ACTIVE and clear the flag in handle_stripe,
does it make sense to make rough change like this?

  static void handle_stripe(struct stripe_head *sh)
  {
         struct stripe_head_state s;
@@ -4675,7 +4682,7 @@ static void handle_stripe(struct stripe_head *sh)
         struct r5dev *pdev, *qdev;

         clear_bit(STRIPE_HANDLE, &sh->state);
-       if (test_and_set_bit_lock(STRIPE_ACTIVE, &sh->state)) {
+       if (test_bit(STRIPE_ACTIVE, &sh->state)) {
                 /* already being handled, ensure it gets handled
                  * again when current action finishes */
                 set_bit(STRIPE_HANDLE, &sh->state);
@@ -4683,9 +4690,9 @@ static void handle_stripe(struct stripe_head *sh)
         }

         if (clear_batch_ready(sh) ) {
-               clear_bit_unlock(STRIPE_ACTIVE, &sh->state);
                 return;
         }
+       set_bit(STRIPE_ACTIVE, &sh->state);
I don't think we can do this. We need atomic test_and_set for the stripe.

Hmm, it guarantees only one process can handle the stripe, others have to return
earlier. And the side effect in current code is that stripe could set with ACTIVE flag
shortly even it is in batch list.

I guess this is not a problem with the change above?


Yes, I agree.

I am not sure if the warning is caused with the scenario which I described before.

>>> Maybe the scenario is that raid5_make_request get a lone stripe, then it is added to handle_list
>>> since STRIPE_HANDLE is valid, then handle_stripe sets STRIPE_ACTIVE flag to the stripe.
>>>
>>> Meantime, another full write writes to the same stripe, and added stripe to batch_list by:
>>> raid5_make_request -> add_stripe_bio -> stripe_add_to_batch_list.
>>>
>>> Then the warning is triggered when err happens in the batch head stripe.

Anyway, I will prepare a patch for above change, and let's see if the warning could
happen or not in future.

BRs,
Guoqing



[Index of Archives]     [Linux RAID Wiki]     [ATA RAID]     [Linux SCSI Target Infrastructure]     [Linux Block]     [Linux IDE]     [Linux SCSI]     [Linux Hams]     [Device Mapper]     [Device Mapper Cryptographics]     [Kernel]     [Linux Admin]     [Linux Net]     [GFS]     [RPM]     [git]     [Yosemite Forum]


  Powered by Linux