Re: [PATCH] md/raid1: fix missing bitmap update w/o WriteMostly devices

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 





On 1/4/22 7:04 AM, Song Liu wrote:
commit [1] causes missing bitmap updates when there isn't any WriteMostly
devices.

Detailed steps to reproduce by Norbert (which somehow didn't make to lore):

    # setup md10 (raid1) with two drives (1 GByte sparse files)
    dd if=/dev/zero of=disk1 bs=1024k seek=1024 count=0
    dd if=/dev/zero of=disk2 bs=1024k seek=1024 count=0

    losetup /dev/loop11 disk1
    losetup /dev/loop12 disk2

    mdadm --create /dev/md10 --level=1 --raid-devices=2 /dev/loop11 /dev/loop12

    # add bitmap (aka write-intent log)
    mdadm /dev/md10 --grow --bitmap=internal

    echo check > /sys/block/md10/md/sync_action

    root:# cat /sys/block/md10/md/mismatch_cnt
    0
    root:#

    # remove member drive disk2 (loop12)
    mdadm /dev/md10 -f loop12 ; mdadm /dev/md10 -r loop12

    # modify degraded md device
    dd if=/dev/urandom of=/dev/md10 bs=512 count=1

    # no blocks recorded as out of sync on the remaining member disk1/loop11
    root:# mdadm -X /dev/loop11 | grep Bitmap
              Bitmap : 16 bits (chunks), 0 dirty (0.0%)
    root:#

    # re-add disk2, nothing synced because of empty bitmap
    mdadm /dev/md10 --re-add /dev/loop12

    # check integrity again
    echo check > /sys/block/md10/md/sync_action

    # disk1 and disk2 are no longer in sync, reads return differend data
    root:# cat /sys/block/md10/md/mismatch_cnt
    128
    root:#

    # clean up
    mdadm -S /dev/md10
    losetup -d /dev/loop11
    losetup -d /dev/loop12
    rm disk1 disk2

Fix this by moving the WriteMostly check to the if condition for
alloc_behind_master_bio().

[1] commit fd3b6975e9c1 ("md/raid1: only allocate write behind bio for WriteMostly device")
Fixes: fd3b6975e9c1 ("md/raid1: only allocate write behind bio for WriteMostly device")
Cc: stable@xxxxxxxxxxxxxxx # v5.12+
Cc: Guoqing Jiang <guoqing.jiang@xxxxxxxxx>
Cc: Jens Axboe <axboe@xxxxxxxxx>
Reported-by: Norbert Warmuth <nwarmuth@xxxxxxxxxxx>
Suggested-by: Linus Torvalds <torvalds@xxxxxxxxxxxxxxxxxxxx>
Signed-off-by: Song Liu <song@xxxxxxxxxx>
---
  drivers/md/raid1.c | 3 ++-
  1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/drivers/md/raid1.c b/drivers/md/raid1.c
index 7dc8026cf6ee..85505424f7a4 100644
--- a/drivers/md/raid1.c
+++ b/drivers/md/raid1.c
@@ -1496,12 +1496,13 @@ static void raid1_write_request(struct mddev *mddev, struct bio *bio,
  		if (!r1_bio->bios[i])
  			continue;
- if (first_clone && test_bit(WriteMostly, &rdev->flags)) {
+		if (first_clone) {
  			/* do behind I/O ?
  			 * Not if there are too many, or cannot
  			 * allocate memory, or a reader on WriteMostly
  			 * is waiting for behind writes to flush */
  			if (bitmap &&
+			    test_bit(WriteMostly, &rdev->flags) &&
  			    (atomic_read(&bitmap->behind_writes)
  			     < mddev->bitmap_info.max_write_behind) &&
  			    !waitqueue_active(&bitmap->behind_wait)) {

Indeed, I missed that md_bitmap_startwrite should be always called for the first clone.

Thanks,
Guoqing





[Index of Archives]     [Linux RAID Wiki]     [ATA RAID]     [Linux SCSI Target Infrastructure]     [Linux Block]     [Linux IDE]     [Linux SCSI]     [Linux Hams]     [Device Mapper]     [Device Mapper Cryptographics]     [Kernel]     [Linux Admin]     [Linux Net]     [GFS]     [RPM]     [git]     [Yosemite Forum]


  Powered by Linux