Re: [PATCH] md: submit MMP reads REQ_SYNC to bypass RAID5 cache

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Mon, 3 Nov 2014 14:01:10 -0700 James Simmons <uja.ornl@xxxxxxxxx> wrote:

> Hello.
> 
>    This is a patch against the latest kernel source which is based on
> a patch used by Lustre. The below describes what we are trying to
> achieve. I like to get a feedback if this is the right approach.
> 
> ----------------------------------------------------------------------
> 
> The ext4 MMP block reads always need to get fresh data from the
> underlying disk.  Otherwise, if a remote node is updating the MMP
> block and the reads are fetched from the MD RAID5 stripe cache,
> it is possible that the local node will incorrectly decide the
> remote node has died and allow the filesystem to be mounted on
> two nodes at the same time.

It is preferred for patches to be inline, rather than as attachments, as it
makes it easier to comment on them....

diff --git a/drivers/md/raid5.c b/drivers/md/raid5.c
index 9c66e59..11b749c 100644
--- a/drivers/md/raid5.c
+++ b/drivers/md/raid5.c
@@ -2678,6 +2678,9 @@ static int add_stripe_bio(struct stripe_head *sh, struct bio *bi, int dd_idx, in
 		}
 		if (sector >= sh->dev[dd_idx].sector + STRIPE_SECTORS)
 			set_bit(R5_OVERWRITE, &sh->dev[dd_idx].flags);
+	} else if (bi->bi_rw & REQ_NOCACHE) {
+		/* force to read from underlying disk if requested */
+		clear_bit(R5_UPTODATE, &sh->dev[dd_idx].flags);
 	}
 
 	pr_debug("added bi b#%llu to stripe s#%llu, disk %d.\n",


This doesn't provide a useful guarantee.  If the device that stores that
block has failed, the md/raid5 will read all other devices to recover the
block.
If that recently happened and you just clear the UPTODATE bit on the block,
md/raid5 will recover the data from all the other blocks, without reading
them.

But considering this at a higher level: if two different nodes try to
assemble the same RAID5 array then you already potentially have a problem.
You really want some sensible cluster co-ordinator and let it make these
decisions.   Hoping the a block device can be a reliable semaphore seems ...
misguided.

NeilBrown

Attachment: pgpouMRBeacqr.pgp
Description: OpenPGP digital signature

--
dm-devel mailing list
dm-devel@xxxxxxxxxx
https://www.redhat.com/mailman/listinfo/dm-devel

[Index of Archives]     [DM Crypt]     [Fedora Desktop]     [ATA RAID]     [Fedora Marketing]     [Fedora Packaging]     [Fedora SELinux]     [Yosemite Discussion]     [KDE Users]     [Fedora Docs]

  Powered by Linux