Re: [PATCH] mmp race

Andreas Dilger <adilger@xxxxxxxxx> · Fri, 25 May 2018 21:04:51 -0600

On Tue, May 22, 2018 at 09:59:07AM +0300, Artem Blagodarenko wrote:
> From: Vitaly Fertman <vitaly_fertman@xxxxxxxxxxx>
> 
> make sleep between reads twice in ext4_multi_mount_protect twice
> longer than between write and read, make the later one equal to the
> system check_interval

This commit message could be improved a bit, something like:

Make the sleep between initial reads in ext4_mmp_start() equal to
twice the maximum of the superblock and MMP block interval, to
ensure that the initial check waits long enough to detect if the
previous owner is still updating the MMP block.

If no activity is detected on the superblock, the sleep between
the MMP write and subsequent re-read is reduced to a single
interval to ensure that the final write completes at the stated
interval so that another node doesn't consider the filesystem idle.

> Xyratex-Bug-Id: MRP-390
> 
> Reviewed-by: Andrew Perepechko <Andrew_Perepechko@xxxxxxxxxxx>
> ---
> lib/ext2fs/mmp.c | 19 +++++++++++--------
> 1 file changed, 11 insertions(+), 8 deletions(-)
> 
> diff --git a/lib/ext2fs/mmp.c b/lib/ext2fs/mmp.c
> index 9a771de7..cb968adf 100644
> --- a/lib/ext2fs/mmp.c
> +++ b/lib/ext2fs/mmp.c
> @@ -296,6 +296,13 @@ errcode_t ext2fs_mmp_start(ext2_filsys fs)
> 	if (mmp_check_interval < EXT4_MMP_MIN_CHECK_INTERVAL)
> 		mmp_check_interval = EXT4_MMP_MIN_CHECK_INTERVAL;
> 
> +	/*
> +	 * If check_interval in MMP block is larger, use that instead of
> +	 * check_interval from the superblock.
> +	 */
> +	if (mmp_s->mmp_check_interval > mmp_check_interval)
> +		mmp_check_interval = mmp_s->mmp_check_interval;

Moving this check up only really affects the "EXT4_MMP_SEQ_CLEAN" case
here, as all other code paths result in an error or they would have hit
the same code below.  In the "clean_seq" case, do we really want to
wait the full MMP interval, if the writing node was busy just before it
unmounted the filesystem?  I don't think this is bad, just potentially
slower than strictly necessary.

> 	seq = mmp_s->mmp_seq;
> 	if (seq == EXT4_MMP_SEQ_CLEAN)
> 		goto clean_seq;

> @@ -309,13 +316,6 @@ errcode_t ext2fs_mmp_start(ext2_filsys fs)
> 		goto mmp_error;
> 	}
> 
> -	/*
> -	 * If check_interval in MMP block is larger, use that instead of
> -	 * check_interval from the superblock.
> -	 */
> -	if (mmp_s->mmp_check_interval > mmp_check_interval)
> -		mmp_check_interval = mmp_s->mmp_check_interval;
> -
> 	sleep(2 * mmp_check_interval + 1);
> 
> 	retval = ext2fs_mmp_read(fs, fs->super->s_mmp_block, fs->mmp_buf);
> @@ -344,7 +344,10 @@ clean_seq:
> 	if (retval)
> 		goto mmp_error;
> 
> -	sleep(2 * mmp_check_interval + 1);
> +	/* This sleep between write & read must be shorter than the previous
> +	 * sleep between 2 reads, so that the check above of a racing thread
> +	 * would never succeed between this write & read. */
> +	sleep(mmp_check_interval + 1);

I think this change makes sense, as we want to update the MMP block a second
time after mmp_s->mmp_check_interval in order for other nodes to detect that
libext2fs is in control of the filesystem.

That said, I wonder if it also makes sense to add:

	if (mmp_s->mmp_check_interval < mmp_check_interval)
		mmp_s->mmp_check_interval = mmp_check_interval;

before the first ext2fs_mmp_write() to properly reflect the longer sleep
interval?  That would only happen if the superblock s_mmp_update_interval
is longer than mmp_s->mmp_check_interval, which shouldn't really be possible,
but is a bit safer.

In summary, while I think there might be some small improvements possible,
I think the current patch is OK, and could add my:

Signed-off-by: Andreas Dilger <adilger@xxxxxxxxx>

Cheers, Andreas

Attachment:
signature.asc

Description: Message signed with OpenPGP