Re: [PATCH -next 1/8] md/raid10: prevent soft lockup while flush writes

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi,

在 2023/04/25 8:23, Song Liu 写道:
On Thu, Apr 20, 2023 at 4:31 AM Yu Kuai <yukuai1@xxxxxxxxxxxxxxx> wrote:

From: Yu Kuai <yukuai3@xxxxxxxxxx>

Currently, there is no limit for raid1/raid10 plugged bio. While flushing
writes, raid1 has cond_resched() while raid10 doesn't, and too many
writes can cause soft lockup.

Follow up soft lockup can be triggered easily with writeback test for
raid10 with ramdisks:

watchdog: BUG: soft lockup - CPU#10 stuck for 27s! [md0_raid10:1293]
Call Trace:
  <TASK>
  call_rcu+0x16/0x20
  put_object+0x41/0x80
  __delete_object+0x50/0x90
  delete_object_full+0x2b/0x40
  kmemleak_free+0x46/0xa0
  slab_free_freelist_hook.constprop.0+0xed/0x1a0
  kmem_cache_free+0xfd/0x300
  mempool_free_slab+0x1f/0x30
  mempool_free+0x3a/0x100
  bio_free+0x59/0x80
  bio_put+0xcf/0x2c0
  free_r10bio+0xbf/0xf0
  raid_end_bio_io+0x78/0xb0
  one_write_done+0x8a/0xa0
  raid10_end_write_request+0x1b4/0x430
  bio_endio+0x175/0x320
  brd_submit_bio+0x3b9/0x9b7 [brd]
  __submit_bio+0x69/0xe0
  submit_bio_noacct_nocheck+0x1e6/0x5a0
  submit_bio_noacct+0x38c/0x7e0
  flush_pending_writes+0xf0/0x240
  raid10d+0xac/0x1ed0

Is it possible to trigger this with a mdadm test?


The test I mentioned in patch 8 can trigger this problem reliablity, so
I this add a new test can achieve this.

Thanks,
Kuai
Thanks,
Song


This patch fix the problem by adding cond_resched() to raid10 like what
raid1 did.

Note that unlimited plugged bio still need to be optimized because in
the case of writeback lots of dirty pages, this will take lots of memory
and io latecy is quite bad.

Signed-off-by: Yu Kuai <yukuai3@xxxxxxxxxx>
---
  drivers/md/raid10.c | 2 ++
  1 file changed, 2 insertions(+)

diff --git a/drivers/md/raid10.c b/drivers/md/raid10.c
index 6590aa49598c..a116b7c9d9f3 100644
--- a/drivers/md/raid10.c
+++ b/drivers/md/raid10.c
@@ -921,6 +921,7 @@ static void flush_pending_writes(struct r10conf *conf)
                         else
                                 submit_bio_noacct(bio);
                         bio = next;
+                       cond_resched();
                 }
                 blk_finish_plug(&plug);
         } else
@@ -1140,6 +1141,7 @@ static void raid10_unplug(struct blk_plug_cb *cb, bool from_schedule)
                 else
                         submit_bio_noacct(bio);
                 bio = next;
+               cond_resched();
         }
         kfree(plug);
  }
--
2.39.2

.





[Index of Archives]     [Linux RAID Wiki]     [ATA RAID]     [Linux SCSI Target Infrastructure]     [Linux Block]     [Linux IDE]     [Linux SCSI]     [Linux Hams]     [Device Mapper]     [Device Mapper Cryptographics]     [Kernel]     [Linux Admin]     [Linux Net]     [GFS]     [RPM]     [git]     [Yosemite Forum]


  Powered by Linux