Re: [PATCH] ext4: fix warning when submitting superblock in ext4_commit_super()

Ritesh Harjani <ritesh.list@xxxxxxxxx> · Thu, 19 May 2022 11:59:29 +0530

On 22/05/19 11:13AM, Zhang Yi wrote:
> On 2022/5/19 1:06, Ritesh Harjani wrote:
> > On 22/05/18 10:10PM, Zhang Yi wrote:
> >> We have already check the io_error and uptodate flag before submitting
> >> the superblock buffer, and re-set the uptodate flag if it has been
> >> failed to write out. But it was lockless and could be raced by another
> >> ext4_commit_super(), and finally trigger '!uptodate' WARNING when
> >> marking buffer dirty. Fix it by submit buffer directly.
> >
> > I agree that there could be a race with multiple processes trying to call
> > ext4_commit_super(). Do you have a easy reproducer for this issue?
> >
>
> Sorry, I don't have a easy reproducer, but we can always reproduce it through
> inject delay and add filters into the ext4_commit_super().

Sure, thanks for sharing.

>
> 1. Apply below diff.
>  static int ext4_commit_super(struct super_block *sb)
>  {
>  	struct buffer_head *sbh = EXT4_SB(sb)->s_sbh;
> @@ -6026,9 +6027,22 @@ static int ext4_commit_super(struct super_block *sb)
>  		set_buffer_uptodate(sbh);
>  	}
>  	BUFFER_TRACE(sbh, "marking dirty");
> +	if (!strcmp(current->comm, "touch"))
> +		pr_err("touch (%d) enter\n", current->pid);
> +	if (!strcmp(current->comm, "mkdir")) {
> +		pr_err("mkdir(%d): wait touch sync\n", current->pid);
> +		msleep(1000);
> +		pr_err("mkdir(%d): wait touch sync %d\n", current->pid, buffer_uptodate(sbh));
> +	}
>  	mark_buffer_dirty(sbh);
> +	if (!strcmp(current->comm, "mkdir"))
> +		pr_err("mkdir(%d): mark\n", current->pid);
>  	error = __sync_dirty_buffer(sbh,
>  		REQ_SYNC | (test_opt(sb, BARRIER) ? REQ_FUA : 0));
> +	if (error) {
> +		pr_err("%s(%d) sync fail %d\n", current->comm, current->pid, buffer_uptodate(sbh));
> +		msleep(2000);
> +	}
>  	if (buffer_write_io_error(sbh)) {
>  		ext4_msg(sb, KERN_ERR, "I/O error while writing "
>  		       "superblock");
>
> 2. Run this script.
> #!/bin/bash
> echo running > /sys/block/sdb/device/state
> sleep 1
> umount /mnt
> mkfs.ext4 -F -E lazy_itable_init=0,lazy_journal_init=0 /dev/sdb
> mount /dev/sdb -o errors=remount-ro,stripe=2048,data_err=abort /mnt
> mkdir /mnt/dir_a
> mkdir -p /mnt/dir_b
>
> sync
> sync
>
> echo 3 > /proc/sys/vm/drop_caches
> echo offline > /sys/block/sdb/device/state
>
> sleep 1
> mkdir /mnt/dir_a/a &
> touch /mnt/dir_b/b
>
>
> [ 1586.472287] ------------[ cut here ]------------
> [ 1586.473834] WARNING: CPU: 14 PID: 1425 at fs/buffer.c:1081 mark_buffer_dirty+0x28f/0x330
> [ 1586.476519] Modules linked in:
> [ 1586.477567] CPU: 14 PID: 1425 Comm: mkdir Not tainted 5.18.0-rc7-dirty #745
> [ 1586.479854] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS ?-20190727_073836-buildvm-ppc4
> [ 1586.482709] RIP: 0010:mark_buffer_dirty+0x28f/0x330
> [ 1586.483400] Code: a8 00 00 00 48 83 05 8f dd 0d 03 01 48 83 e8 01 e9 df fe ff ff 48 83 05 fe e1 0d 033
> [ 1586.488136] RSP: 0018:ffffa8a6c0ef3b90 EFLAGS: 00010202
> [ 1586.490142] RAX: 0000000000116418 RBX: ffff93f5bd899000 RCX: 0000000000000000
> [ 1586.492571] RDX: 0000000000000000 RSI: ffffffff8bef9549 RDI: ffff93f5bd899000
> [ 1586.494988] RBP: ffff93f5beffd000 R08: 0000000000000000 R09: ffffa8a6c0ef39c0
> [ 1586.497380] R10: 0000000000000001 R11: 0000000000000001 R12: ffff93f5b3de0000
> [ 1586.499674] R13: 0000000000000000 R14: ffffffff8b849da0 R15: 0000000000000000
> [ 1586.501964] FS:  00007f561455c0c0(0000) GS:ffff93fc65980000(0000) knlGS:0000000000000000
> [ 1586.504493] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [ 1586.506303] CR2: 00007f5614706f80 CR3: 0000000105534000 CR4: 00000000000006e0
> [ 1586.508561] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> [ 1586.509652] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> [ 1586.510654] Call Trace:
> [ 1586.511560]  <TASK>
> [ 1586.512228]  ext4_commit_super+0xb1/0x2e0
> [ 1586.513362]  ext4_handle_error+0x287/0x2a0
> [ 1586.514508]  __ext4_error+0x138/0x240
> [ 1586.515527]  ? __might_sleep+0x56/0xb0
> [ 1586.516571]  ? __getblk_gfp+0x47/0x630
> [ 1586.517636]  ext4_journal_check_start+0xd1/0xf0
> [ 1586.518884]  __ext4_journal_start_sb+0x61/0x1f0
> [ 1586.520126]  __ext4_new_inode+0x12ee/0x2670
> [ 1586.521283]  ? ext4_lookup+0x297/0x340
> [ 1586.522322]  ext4_mkdir+0x1a5/0x4f0
> [ 1586.523298]  vfs_mkdir+0x7c/0x1b0
> [ 1586.523981]  do_mkdirat+0x9e/0x160
> [ 1586.524488]  __x64_sys_mkdir+0x41/0x60
> [ 1586.525054]  do_syscall_64+0x3b/0x90
> [ 1586.525590]  entry_SYSCALL_64_after_hwframe+0x44/0xae
> [ 1586.526341] RIP: 0033:0x7f5614710ecb
>
> > Also do you think something like below should fix the problem too?
> > So if you lock the buffer from checking until marking the buffer dirty, that
> > should avoid the race too that you are reporting.
> > Thoughts?
> >
>
> Thanks for your suggestion. I've thought about this solution and yes it's simpler
> to fix the race, but I think we lock and unlock the sbh several times just for
> calling standard buffer write helpers is not so good. Opencode the submit
> procedure looks more clear to me.

I agree your solution was cleaner since it does not has a lot of lock/unlock.
My suggestion came in from looking at the history.
This lock was added here [1] and I think it somehow got removed in this patch[2]

[1]: https://lore.kernel.org/linux-ext4/1467285150-15977-2-git-send-email-pranjas@xxxxxxxxx/
[2]: https://lore.kernel.org/linux-ext4/20201216101844.22917-5-jack@xxxxxxx/

Rather then solutions, I had few queries :)
1. What are the implications of not using mark_buffer_dirty()/__sync_dirty_buffer()
2. In your solution one thing which I was not clear of, was whether we should
	call clear_buffer_dirty() before calling submit_bh(), in case if somehow(?)
	the state of the buffer was already marked dirty? Not sure how this can
	happen, but I see the logic in mark_buffer_dirty() which checks, if the
	buffer is already marked dirty, it simply returns. Then __sync_dirty_buffer()
	clears the buffer dirty state.

> Anyway, Your solution is also fine by me.

I think if we get some answers to above. It will give us more confidence on
whether should we open code submit_bh() logic or should we use
mark_buffer_dirty()/__sync_dirty_buffer() (with lock_buffer() to prevent the
warning which you reported).

-ritesh

>
> Thanks,
> Yi.
>
> > diff --git a/fs/ext4/super.c b/fs/ext4/super.c
> > index 6900da973ce2..3447841fe654 100644
> > --- a/fs/ext4/super.c
> > +++ b/fs/ext4/super.c
> > @@ -6007,6 +6007,7 @@ static int ext4_commit_super(struct super_block *sb)
> >
> >         ext4_update_super(sb);
> >
> > +       lock_buffer(sbh);
> >         if (buffer_write_io_error(sbh) || !buffer_uptodate(sbh)) {
> >                 /*
> >                  * Oh, dear.  A previous attempt to write the
> > @@ -6023,6 +6024,7 @@ static int ext4_commit_super(struct super_block *sb)
> >         }
> >         BUFFER_TRACE(sbh, "marking dirty");
> >         mark_buffer_dirty(sbh);
> > +       unlock_buffer(sbh);
> >         error = __sync_dirty_buffer(sbh,
> >                 REQ_SYNC | (test_opt(sb, BARRIER) ? REQ_FUA : 0));
> >         if (buffer_write_io_error(sbh)) {
> > .
> >