Re: [syzbot] BUG: unable to handle kernel NULL pointer dereference in set_page_dirty

Jaegeuk Kim <jaegeuk@xxxxxxxxxx> · Mon, 29 Aug 2022 11:34:14 -0700

On 08/29, Matthew Wilcox wrote:
> On Mon, Aug 29, 2022 at 10:52:57AM -0700, Jaegeuk Kim wrote:
> > On 08/25, Andrew Morton wrote:
> > > (cc fsf2 developers)
> > > 
> > > On Thu, 25 Aug 2022 08:29:32 -0700 syzbot <syzbot+775a3440817f74fddb8c@xxxxxxxxxxxxxxxxxxxxxxxxx> wrote:
> > > 
> > > > Hello,
> > > > 
> > > > syzbot found the following issue on:
> > > > 
> > > > HEAD commit:    a41a877bc12d Merge branch 'for-next/fixes' into for-kernelci
> > > > git tree:       git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux.git for-kernelci
> > > > console output: https://syzkaller.appspot.com/x/log.txt?x=175def47080000
> > > > kernel config:  https://syzkaller.appspot.com/x/.config?x=5cea15779c42821c
> > > > dashboard link: https://syzkaller.appspot.com/bug?extid=775a3440817f74fddb8c
> > > > compiler:       Debian clang version 13.0.1-++20220126092033+75e33f71c2da-1~exp1~20220126212112.63, GNU ld (GNU Binutils for Debian) 2.35.2
> > > > userspace arch: arm64
> > > > 
> > > > Unfortunately, I don't have any reproducer for this issue yet.
> > > > 
> > > > IMPORTANT: if you fix the issue, please add the following tag to the commit:
> > > > Reported-by: syzbot+775a3440817f74fddb8c@xxxxxxxxxxxxxxxxxxxxxxxxx
> > > > 
> > > > Unable to handle kernel NULL pointer dereference at virtual address 0000000000000000
> > > > Mem abort info:
> > > >   ESR = 0x0000000086000005
> > > >   EC = 0x21: IABT (current EL), IL = 32 bits
> > > >   SET = 0, FnV = 0
> > > >   EA = 0, S1PTW = 0
> > > >   FSC = 0x05: level 1 translation fault
> > > > user pgtable: 4k pages, 48-bit VAs, pgdp=00000001249cc000
> > > > [0000000000000000] pgd=080000012ee65003, p4d=080000012ee65003, pud=0000000000000000
> > > > Internal error: Oops: 86000005 [#1] PREEMPT SMP
> > > > Modules linked in:
> > > > CPU: 0 PID: 3044 Comm: syz-executor.0 Not tainted 6.0.0-rc2-syzkaller-16455-ga41a877bc12d #0
> > > > Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 06/20/2022
> > > > pstate: 80400005 (Nzcv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
> > > > pc : 0x0
> > > > lr : folio_mark_dirty+0xbc/0x208 mm/page-writeback.c:2748
> > > > sp : ffff800012803830
> > > > x29: ffff800012803830 x28: ffff0000d02c8000 x27: 0000000000000009
> > > > x26: 0000000000000001 x25: 0000000000000a00 x24: 0000000000000080
> > > > x23: 0000000000000000 x22: ffff0000ef276c00 x21: 05ffc00000000007
> > > > x20: ffff0000f14b83b8 x19: fffffc00036409c0 x18: fffffffffffffff5
> > > > x17: ffff80000dd7a698 x16: ffff80000dbb8658 x15: 0000000000000000
> > > > x14: 0000000000000000 x13: 0000000000000000 x12: 0000000000000000
> > > > x11: ff808000083e9814 x10: 0000000000000000 x9 : ffff8000083e9814
> > > > x8 : 0000000000000000 x7 : 0000000000000000 x6 : 0000000000000000
> > > > x5 : ffff0000d9028000 x4 : ffff0000d5c31000 x3 : ffff0000d9027f80
> > > > x2 : fffffffffffffff0 x1 : fffffc00036409c0 x0 : ffff0000f14b83b8
> > > > Call trace:
> > > >  0x0
> > > >  set_page_dirty+0x38/0xbc mm/folio-compat.c:62
> > 
> > 2363 void f2fs_update_meta_page(struct f2fs_sb_info *sbi,
> > 2364                                         void *src, block_t blk_addr)
> > 2365 {       
> > 2366         struct page *page = f2fs_grab_meta_page(sbi, blk_addr);
> > 
> > --> f2fs_grab_meta_page() gives a locked page by grab_cache_page().
> > 
> > 2367                                                         
> > 2368         memcpy(page_address(page), src, PAGE_SIZE);
> > 2369         set_page_dirty(page);
> > 2370         f2fs_put_page(page, 1);
> > 2371 } 
> > 
> > Is there a change in folio?
> 
> Not directly, but there was a related change, 0af573780b0b which
> requires aops->set_page_dirty to be set; is that perhaps missing?
> I don't see one in the f2fs_compress_aops, for example.

Do you mean dirty_folio? I think all aops have it except the compressed one
that we don't make it dirty.

> 
> The other possibiity is that it's a mapping that is missing an ->a_ops.
> Is that something f2fs ever does?

Hmm, no, I haven't seen this before, and we set aops when mounting the
file system. Ah, if this happens on the corrupted image, yeah, maybe.. I need
to check the error path in f2fs_fill_super.

> 
> I only managed to narrow down the crash to the line:
>                 return mapping->a_ops->dirty_folio(mapping, folio);
> so either mapping->a_ops is NULL or mapping->a_ops->dirty_folio is
> NULL.  The reproducer was on ARM and ARM doesn't emit a 'Code:' line,
> unlike x86.