Re: [PATCH v5 00/14] dm-raid/md/raid: fix v6.7 regressions

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi,

在 2024/02/16 13:46, Benjamin Marzinski 写道:
On Thu, Feb 15, 2024 at 02:24:34PM -0800, Song Liu wrote:
On Thu, Feb 1, 2024 at 1:30 AM Yu Kuai <yukuai1@xxxxxxxxxxxxxxx> wrote:

[...]

[1] https://lore.kernel.org/all/CALTww29QO5kzmN6Vd+jT=-8W5F52tJjHKSgrfUc1Z1ZAeRKHHA@xxxxxxxxxxxxxx/

Yu Kuai (14):
   md: don't ignore suspended array in md_check_recovery()
   md: don't ignore read-only array in md_check_recovery()
   md: make sure md_do_sync() will set MD_RECOVERY_DONE
   md: don't register sync_thread for reshape directly
   md: don't suspend the array for interrupted reshape
   md: fix missing release of 'active_io' for flush

Applied 1/14-5/14 to md-6.8 branch (6/14 was applied earlier).

Thanks,
Song

I'm still seeing new failures that I can't reproduce in the 6.6 kernel,
specifically:

lvconvert-raid-reshape-stripes-load-reload.sh
lvconvert-repair-raid.sh

with lvconvert-raid-reshape-stripes-load-reload.sh Patch 12/14
("md/raid456: fix a deadlock for dm-raid456 while io concurrent with
reshape") is changing a hang to a corruption. The issues is that we
can't simply fail IO that crosses the reshape position. I assume that
the correct thing to do is have dm-raid reissue it after the suspend,
when the reshape can make progress again. Perhaps something like this,
only less naive (although this patch does make the test pass for me).
Heinz, any thoughts on this? Otherwise, I'll look into this a little
more and post a RFC patch.

Does the corruption looks like below?

[12504.959682] BUG bio-296 (Not tainted): Object already free
[12504.960239] -----------------------------------------------------------------------------
[12504.960239]
[12504.961209] Allocated in mempool_alloc+0xe8/0x270 age=30 cpu=1 pid=203288
[12504.961905]  kmem_cache_alloc+0x36a/0x3b0
[12504.962324]  mempool_alloc+0xe8/0x270
[12504.962712]  bio_alloc_bioset+0x3b5/0x920
[12504.963129]  bio_alloc_clone+0x3e/0x160
[12504.963533]  alloc_io+0x3d/0x1f0
[12504.963876]  dm_submit_bio+0x12f/0xa30
[12504.964267]  __submit_bio+0x9c/0xe0
[12504.964639]  submit_bio_noacct_nocheck+0x25a/0x570
[12504.965136]  submit_bio_wait+0xc2/0x160
[12504.965535]  blkdev_issue_zeroout+0x19b/0x2e0
[12504.965991]  ext4_init_inode_table+0x246/0x560
[12504.966462]  ext4_lazyinit_thread+0x750/0xbe0
[12504.966922]  kthread+0x1b4/0x1f0

I assum that this is a dm problem and I'm still trying to debug it.
Can you explain more why IO that crosses the reshape position can't
fail directly?

Thanks,
Kuai


=========================================================
diff --git a/drivers/md/dm-raid.c b/drivers/md/dm-raid.c
index ed8c28952b14..ff481d494b04 100644
--- a/drivers/md/dm-raid.c
+++ b/drivers/md/dm-raid.c
@@ -3345,6 +3345,14 @@ static int raid_map(struct dm_target *ti, struct bio *bio)
  	return DM_MAPIO_SUBMITTED;
  }
+static int raid_end_io(struct dm_target *ti, struct bio *bio,
+		       blk_status_t *error)
+{
+	if (*error != BLK_STS_IOERR || !dm_noflush_suspending(ti))
+		return DM_ENDIO_DONE;
+	return DM_ENDIO_REQUEUE;
+}

+
  /* Return sync state string for @state */
  enum sync_state { st_frozen, st_reshape, st_resync, st_check, st_repair, st_recover, st_idle };
  static const char *sync_str(enum sync_state state)
@@ -4100,6 +4108,7 @@ static struct target_type raid_target = {
  	.ctr = raid_ctr,
  	.dtr = raid_dtr,
  	.map = raid_map,
+	.end_io = raid_end_io,
  	.status = raid_status,
  	.message = raid_message,
  	.iterate_devices = raid_iterate_devices,
=========================================================


   md: export helpers to stop sync_thread
   md: export helper md_is_rdwr()
   dm-raid: really frozen sync_thread during suspend
   md/dm-raid: don't call md_reap_sync_thread() directly
   dm-raid: add a new helper prepare_suspend() in md_personality
   md/raid456: fix a deadlock for dm-raid456 while io concurrent with
     reshape
   dm-raid: fix lockdep waring in "pers->hot_add_disk"
   dm-raid: remove mddev_suspend/resume()

  drivers/md/dm-raid.c |  78 +++++++++++++++++++--------
  drivers/md/md.c      | 126 +++++++++++++++++++++++++++++--------------
  drivers/md/md.h      |  16 ++++++
  drivers/md/raid10.c  |  16 +-----
  drivers/md/raid5.c   |  61 +++++++++++----------
  5 files changed, 192 insertions(+), 105 deletions(-)

--
2.39.2



.






[Index of Archives]     [Linux RAID Wiki]     [ATA RAID]     [Linux SCSI Target Infrastructure]     [Linux Block]     [Linux IDE]     [Linux SCSI]     [Linux Hams]     [Device Mapper]     [Device Mapper Cryptographics]     [Kernel]     [Linux Admin]     [Linux Net]     [GFS]     [RPM]     [git]     [Yosemite Forum]


  Powered by Linux