Re: [PATCH 4/8] dmthin: erase the metadata device properly before starting

Eryu Guan <guan@xxxxxxx> · Sun, 18 Jul 2021 22:32:08 +0800

On Tue, Jul 06, 2021 at 05:21:28PM -0700, Darrick J. Wong wrote:
> From: Darrick J. Wong <djwong@xxxxxxxxxx>
> 
> Every now and then I see the following failure when running generic/347:
> 
> --- generic/347.out
> +++ generic/347.out.bad
> @@ -1,2 +1,2 @@
>  QA output created by 347
> -=== completed
> +failed to create dm thin pool device

Note that I have to add space indention to above diff lines, otherwise
it seems git am will treat lines after "---" as the beginning of patch,
and failed to apply.

Thanks,
Eryu

> 
> Accompanied by the following dmesg spew:
> 
> device-mapper: thin metadata: sb_check failed: blocknr 7016996765293437281: wanted 0
> device-mapper: block manager: superblock validator check failed for block 0
> device-mapper: thin metadata: couldn't read superblock
> device-mapper: table: 253:2: thin-pool: Error creating metadata object
> device-mapper: ioctl: error adding target to table
> 
> 7016996765293437281 is of course the magic number 0x6161616161616161,
> which are stale ondisk contents left behind by previous tests that wrote
> known tests patterns to files on the scratch device.  This is a bit
> surprising, since _dmthin_init supposedly zeroes the first 4k of the
> thin pool metadata device before initializing the pool.  Or does it?
> 
> dd if=/dev/zero of=$DMTHIN_META_DEV bs=4096 count=1 &>/dev/null
> 
> Herein lies the problem: the dd process writes zeroes into the page
> cache and exits.  Normally the block layer will flush the page cache
> after the last file descriptor is closed, but once in a while the
> terminating dd process won't be the only process in the system with an
> open file descriptor!
> 
> That process is of course udev.  The write() call from dd triggers a
> kernel uevent, which starts udev.  If udev is running particularly
> slowly, it'll still be running an instant later when dd terminates,
> thereby preventing the page cache flush.  If udev is still running a
> moment later when we call dmsetup to set up the thin pool, the pool
> creation will issue a bio to read the ondisk superblock.  This read
> isn't coherent with the page cache, so it sees old disk contents and the
> test fails even though we supposedly formatted the metadata device.
> 
> Fix this by explicitly flushing the page cache after writing the zeroes.
> 
> Fixes: 4b52fffb ("dm-thinp helpers in common/dmthin")
> Signed-off-by: Darrick J. Wong <djwong@xxxxxxxxxx>
> ---
>  common/dmthin |    8 ++++++--
>  1 file changed, 6 insertions(+), 2 deletions(-)
> 
> 
> diff --git a/common/dmthin b/common/dmthin
> index 3b1c7d45..91147e47 100644
> --- a/common/dmthin
> +++ b/common/dmthin
> @@ -113,8 +113,12 @@ _dmthin_init()
>  	_dmsetup_create $DMTHIN_DATA_NAME --table "$DMTHIN_DATA_TABLE" || \
>  		_fatal "failed to create dm thin data device"
>  
> -	# Zap the pool metadata dev
> -	dd if=/dev/zero of=$DMTHIN_META_DEV bs=4096 count=1 &>/dev/null
> +	# Zap the pool metadata dev.  Explicitly fsync the zeroes to disk
> +	# because a slow-running udev running concurrently with dd can maintain
> +	# an open file descriptor.  The block layer only flushes the page cache
> +	# on last close, which means that the thin pool creation below will
> +	# see the (stale) ondisk contents and fail.
> +	dd if=/dev/zero of=$DMTHIN_META_DEV bs=4096 count=1 conv=fsync &>/dev/null
>  
>  	# Thin pool
>  	# "start length thin-pool metadata_dev data_dev data_block_size low_water_mark"