Re: RO mount of ext4 filesystem causes writes

Ritesh Harjani (IBM) <ritesh.list@xxxxxxxxx> · Fri, 23 Jun 2023 21:08:37 +0530

"Theodore Ts'o" <tytso@xxxxxxx> writes:

> On Thu, Jun 22, 2023 at 11:18:04PM -0700, Sean Greenslade wrote:
>> I perhaps should have been more explicit in my report. The issue is not
>> that the image is any different after the mount; indeed, the md5sums are
>> identical before and after on my machine as well. The issue is that
>> something is issuing writes to the backing image, which bumps the mtime
>> of the backing image. When handling the images with rsync, a difference
>> in mtime causes the whole image to need to be read.
>
> Ah, yes, your initial report said "small writes", but it didn't
> specify whether the issue was that writes were modifying the image, or
> just simply touching the mtime field of the backing file.  I assume
> these must be largish fs images, since it must have made the increased
> rsync time noticeable?
>
> This appears to fix the problem for me, given the clarified
> reproduction information.  Could you please try it on your end?
>
> 	     		   	     	    - Ted
>
> From 6bb438fa0aac4c08acd626d408cb6d4b745df7fd Mon Sep 17 00:00:00 2001
> From: Theodore Ts'o <tytso@xxxxxxx>
> Date: Fri, 23 Jun 2023 10:18:51 -0400
> Subject: [PATCH] ext4: avoid updating the superblock on a r/o mount if not
>  needed
>
> This was noticed by a user who noticied that the mtime of a file
> backing a loopback device was getting bumped when the loopback device
> is mounted read/only.  Note: This doesn't show up when doing a
> loopback mount of a file directly, via "mount -o ro /tmp/foo.img
> /mnt", since the loop device is set read-only when mount automatically
> creates loop device.  However, this is noticeable for a LUKS loop
> device like this:
>
> % cryptsetup luksOpen /tmp/foo.img test
> % mount -o ro /dev/loop0 /mnt ; umount /mnt
>
> or, if LUKS is not in use, if the user manually creates the loop
> device like this:
>
> % losetup /dev/loop0 /tmp/foo.img
> % mount -o ro /dev/loop0 /mnt ; umount /mnt
>
> The modified mtime causes rsync to do a rolling checksum scan of the
> file on the local and remote side, incrementally increasing the time
> to rsync the not-modified-but-touched image file.
>
> Fixes: eee00237fa5e ("ext4: commit super block if fs record error when journal record without error")
> Cc: stable@xxxxxxxxxx
> Link: https://lore.kernel.org/r/ZIauBR7YiV3rVAHL@glitch
> Reported-by: Sean Greenslade <sean@xxxxxxxxxxxxxxxxxx>
> Signed-off-by: Theodore Ts'o <tytso@xxxxxxx>
> ---
>  fs/ext4/super.c | 12 ++++++++++--
>  1 file changed, 10 insertions(+), 2 deletions(-)
>
> diff --git a/fs/ext4/super.c b/fs/ext4/super.c
> index b3819e70093e..c638b0db3b2b 100644
> --- a/fs/ext4/super.c
> +++ b/fs/ext4/super.c
> @@ -5997,19 +5997,27 @@ static int ext4_load_journal(struct super_block *sb,
>  		err = jbd2_journal_wipe(journal, !really_read_only);
>  	if (!err) {
>  		char *save = kmalloc(EXT4_S_ERR_LEN, GFP_KERNEL);
> +		__le16 orig_state;
> +		bool changed = false;
>  
>  		if (save)
>  			memcpy(save, ((char *) es) +
>  			       EXT4_S_ERR_START, EXT4_S_ERR_LEN);
>  		err = jbd2_journal_load(journal);
> -		if (save)
> +		if (save && memcmp(((char *) es) + EXT4_S_ERR_START,
> +				   save, EXT4_S_ERR_LEN)) {
>  			memcpy(((char *) es) + EXT4_S_ERR_START,
>  			       save, EXT4_S_ERR_LEN);
> +			changed = true;
> +		}

It seems in the original code what we were trying to do was to preseve
the error information area of superblock across journal load (which I am
not sure why though?)

In the new code we see if the journal load changed that area and if yes
we change that back to original log but we also marked changed = true. Why?

>  		kfree(save);
> +		orig_state = es->s_state;
>  		es->s_state |= cpu_to_le16(EXT4_SB(sb)->s_mount_state &
>  					   EXT4_ERROR_FS);
> +		if (orig_state != es->s_state)
> +			changed = true;
>  		/* Write out restored error information to the superblock */
> -		if (!bdev_read_only(sb->s_bdev)) {
> +		if (changed && !really_read_only) {
>  			int err2;
>  			err2 = ext4_commit_super(sb);
>  			err = err ? : err2;

Yes, this make sense. Earlier we were always doing ext4_commit_super()
even if es->s_state hasn't changed. But this code we only do
ext4_commit_super when there is a es->s_state change from orig_state.

-ritesh

> -- 
> 2.31.0