Re: [PATCH -next] ext4: Fix BUG_ON in ext4_bread when write quota data

Lukas Czerner <lczerner@xxxxxxxxxx> · Wed, 22 Dec 2021 15:47:12 +0100

On Wed, Dec 22, 2021 at 03:37:57PM +0100, Jan Kara wrote:
> On Wed 22-12-21 09:35:37, Ye Bin wrote:
> > We got issue as follows when run syzkaller:
> > [  167.936972] EXT4-fs error (device loop0): __ext4_remount:6314: comm rep: Abort forced by user
> > [  167.938306] EXT4-fs (loop0): Remounting filesystem read-only
> > [  167.981637] Assertion failure in ext4_getblk() at fs/ext4/inode.c:847: '(EXT4_SB(inode->i_sb)->s_mount_state & EXT4_FC_REPLAY) || handle != NULL || create == 0'
> > [  167.983601] ------------[ cut here ]------------
> > [  167.984245] kernel BUG at fs/ext4/inode.c:847!
> > [  167.984882] invalid opcode: 0000 [#1] PREEMPT SMP KASAN PTI
> > [  167.985624] CPU: 7 PID: 2290 Comm: rep Tainted: G    B             5.16.0-rc5-next-20211217+ #123
> > [  167.986823] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS ?-20190727_073836-buildvm-ppc64le-16.ppc.fedoraproject.org-3.fc31 04/01/2014
> > [  167.988590] RIP: 0010:ext4_getblk+0x17e/0x504
> > [  167.989189] Code: c6 01 74 28 49 c7 c0 a0 a3 5c 9b b9 4f 03 00 00 48 c7 c2 80 9c 5c 9b 48 c7 c6 40 b6 5c 9b 48 c7 c7 20 a4 5c 9b e8 77 e3 fd ff <0f> 0b 8b 04 244
> > [  167.991679] RSP: 0018:ffff8881736f7398 EFLAGS: 00010282
> > [  167.992385] RAX: 0000000000000094 RBX: 1ffff1102e6dee75 RCX: 0000000000000000
> > [  167.993337] RDX: 0000000000000001 RSI: ffffffff9b6e29e0 RDI: ffffed102e6dee66
> > [  167.994292] RBP: ffff88816a076210 R08: 0000000000000094 R09: ffffed107363fa09
> > [  167.995252] R10: ffff88839b1fd047 R11: ffffed107363fa08 R12: ffff88816a0761e8
> > [  167.996205] R13: 0000000000000000 R14: 0000000000000021 R15: 0000000000000001
> > [  167.997158] FS:  00007f6a1428c740(0000) GS:ffff88839b000000(0000) knlGS:0000000000000000
> > [  167.998238] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > [  167.999025] CR2: 00007f6a140716c8 CR3: 0000000133216000 CR4: 00000000000006e0
> > [  167.999987] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> > [  168.000944] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> > [  168.001899] Call Trace:
> > [  168.002235]  <TASK>
> > [  168.007167]  ext4_bread+0xd/0x53
> > [  168.007612]  ext4_quota_write+0x20c/0x5c0
> > [  168.010457]  write_blk+0x100/0x220
> > [  168.010944]  remove_free_dqentry+0x1c6/0x440
> > [  168.011525]  free_dqentry.isra.0+0x565/0x830
> > [  168.012133]  remove_tree+0x318/0x6d0
> > [  168.014744]  remove_tree+0x1eb/0x6d0
> > [  168.017346]  remove_tree+0x1eb/0x6d0
> > [  168.019969]  remove_tree+0x1eb/0x6d0
> > [  168.022128]  qtree_release_dquot+0x291/0x340
> > [  168.023297]  v2_release_dquot+0xce/0x120
> > [  168.023847]  dquot_release+0x197/0x3e0
> > [  168.024358]  ext4_release_dquot+0x22a/0x2d0
> > [  168.024932]  dqput.part.0+0x1c9/0x900
> > [  168.025430]  __dquot_drop+0x120/0x190
> > [  168.025942]  ext4_clear_inode+0x86/0x220
> > [  168.026472]  ext4_evict_inode+0x9e8/0xa22
> > [  168.028200]  evict+0x29e/0x4f0
> > [  168.028625]  dispose_list+0x102/0x1f0
> > [  168.029148]  evict_inodes+0x2c1/0x3e0
> > [  168.030188]  generic_shutdown_super+0xa4/0x3b0
> > [  168.030817]  kill_block_super+0x95/0xd0
> > [  168.031360]  deactivate_locked_super+0x85/0xd0
> > [  168.031977]  cleanup_mnt+0x2bc/0x480
> > [  168.033062]  task_work_run+0xd1/0x170
> > [  168.033565]  do_exit+0xa4f/0x2b50
> > [  168.037155]  do_group_exit+0xef/0x2d0
> > [  168.037666]  __x64_sys_exit_group+0x3a/0x50
> > [  168.038237]  do_syscall_64+0x3b/0x90
> > [  168.038751]  entry_SYSCALL_64_after_hwframe+0x44/0xae
> > 
> > In order to reproduce this problem, the following conditions need to be met:
> > 1. Ext4 filesystem with no journal;
> > 2. Filesystem image with incorrect quota data;
> > 3. Abort filesystem forced by user;
> > 4. umount filesystem;
> > 
> > As in ext4_quota_write:
> > ...
> >          if (EXT4_SB(sb)->s_journal && !handle) {
> >                  ext4_msg(sb, KERN_WARNING, "Quota write (off=%llu, len=%llu)"
> >                          " cancelled because transaction is not started",
> >                          (unsigned long long)off, (unsigned long long)len);
> >                  return -EIO;
> >          }
> > ...
> > We only check handle if NULL when filesystem has journal. There is need
> > check handle if NULL even when filesystem has no journal.
> 
> Hum, so I think we can just drop this whole condition and warning and
> instead add a silent return of -EROFS if the filesystem is readonly. That
> should also fix the bug and also make aborted case less noisy. The message
> from ext4_quota_write() IMO is not very useful and ext4_bread() will
> provide us with a BUG and stacktrace which is better for debugging what
> went wrong anyway.

Can we really only get here when the fs is read only ? We're going to
end up with NULL handle here any time the ext4_journal_start() fails in
ext4_release_dquot() which I think can fail for plenty of reasons, not
just EROFS.

-Lukas

> 
> 								Honza
> 
> > 
> > Signed-off-by: Ye Bin <yebin10@xxxxxxxxxx>
> > ---
> >  fs/ext4/super.c | 5 +++--
> >  1 file changed, 3 insertions(+), 2 deletions(-)
> > 
> > diff --git a/fs/ext4/super.c b/fs/ext4/super.c
> > index 071b7b3c5678..c8ca5811ea65 100644
> > --- a/fs/ext4/super.c
> > +++ b/fs/ext4/super.c
> > @@ -6955,9 +6955,10 @@ static ssize_t ext4_quota_write(struct super_block *sb, int type,
> >  	struct buffer_head *bh;
> >  	handle_t *handle = journal_current_handle();
> >  
> > -	if (EXT4_SB(sb)->s_journal && !handle) {
> > +	if (!handle) {
> >  		ext4_msg(sb, KERN_WARNING, "Quota write (off=%llu, len=%llu)"
> > -			" cancelled because transaction is not started",
> > +			" cancelled because transaction is not started"
> > +			" or filesystem is abort forced by user",
> >  			(unsigned long long)off, (unsigned long long)len);
> >  		return -EIO;
> >  	}
> > -- 
> > 2.31.1
> > 
> -- 
> Jan Kara <jack@xxxxxxxx>
> SUSE Labs, CR
>