Re: [syzbot] [btrfs?] kernel BUG in __folio_start_writeback

Qu Wenruo <wqu@xxxxxxxx> · Tue, 26 Nov 2024 19:13:37 +1030

在 2024/11/25 21:14, Aleksandr Nogikh 写道:
On Mon, Nov 25, 2024 at 1:30 AM 'Qu Wenruo' via syzkaller-bugs
<syzkaller-bugs@xxxxxxxxxxxxxxxx> wrote:

在 2024/11/25 07:56, Matthew Wilcox 写道:
On Sun, Nov 24, 2024 at 05:45:18AM -0800, syzbot wrote:

   __fput+0x5ba/0xa50 fs/file_table.c:458
   task_work_run+0x24f/0x310 kernel/task_work.c:239
   resume_user_mode_work include/linux/resume_user_mode.h:50 [inline]
   exit_to_user_mode_loop kernel/entry/common.c:114 [inline]
   exit_to_user_mode_prepare include/linux/entry-common.h:329 [inline]
   __syscall_exit_to_user_mode_work kernel/entry/common.c:207 [inline]
   syscall_exit_to_user_mode+0x13f/0x340 kernel/entry/common.c:218
   do_syscall_64+0x100/0x230 arch/x86/entry/common.c:89
   entry_SYSCALL_64_after_hwframe+0x77/0x7f

This is:

          VM_BUG_ON_FOLIO(folio_test_writeback(folio), folio);

ie we've called __folio_start_writeback() on a folio which is already
under writeback.

Higher up in the trace, we have the useful information:

   page: refcount:6 mapcount:0 mapping:ffff888077139710 index:0x3 pfn:0x72ae5
   memcg:ffff888140adc000
   aops:btrfs_aops ino:105 dentry name(?):"file2"
   flags: 0xfff000000040ab(locked|waiters|uptodate|lru|private|writeback|node=0|zone=1|lastcpupid=0x7ff)
   raw: 00fff000000040ab ffffea0001c8f408 ffffea0000939708 ffff888077139710
   raw: 0000000000000003 0000000000000001 00000006ffffffff ffff888140adc000
   page dumped because: VM_BUG_ON_FOLIO(folio_test_writeback(folio))
   page_owner tracks the page as allocated

The interesting part of the page_owner stacktrace is:

    filemap_alloc_folio_noprof+0xdf/0x500
    __filemap_get_folio+0x446/0xbd0
    prepare_one_folio+0xb6/0xa20
    btrfs_buffered_write+0x6bd/0x1150
    btrfs_direct_write+0x52d/0xa30
    btrfs_do_write_iter+0x2a0/0x760
    do_iter_readv_writev+0x600/0x880
    vfs_writev+0x376/0xba0

(ie not very interesting)

Workqueue: btrfs-delalloc btrfs_work_helper
RIP: 0010:__folio_start_writeback+0xc06/0x1050 mm/page-writeback.c:3119
Call Trace:
   <TASK>
   process_one_folio fs/btrfs/extent_io.c:187 [inline]
   __process_folios_contig+0x31c/0x540 fs/btrfs/extent_io.c:216
   submit_one_async_extent fs/btrfs/inode.c:1229 [inline]
   submit_compressed_extents+0xdb3/0x16e0 fs/btrfs/inode.c:1632
   run_ordered_work fs/btrfs/async-thread.c:245 [inline]
   btrfs_work_helper+0x56b/0xc50 fs/btrfs/async-thread.c:324
   process_one_work kernel/workqueue.c:3229 [inline]

This looks like a race?

process_one_folio() calls
btrfs_folio_clamp_set_writeback calls
btrfs_subpage_set_writeback:

          spin_lock_irqsave(&subpage->lock, flags);
          bitmap_set(subpage->bitmaps, start_bit, len >> fs_info->sectorsize_bits)
;
          if (!folio_test_writeback(folio))
                  folio_start_writeback(folio);
          spin_unlock_irqrestore(&subpage->lock, flags);

so somebody else set writeback after we tested for writeback here.

The test VM is using X86_64, thus we won't go into the subpage routine,
but directly call folio_start_writeback().

One thing that comes to mind is that _usually_ we take folio_lock()
first, then start writeback, then call folio_unlock() and btrfs isn't
doing that here (afaict).  Maybe that's not the source of the bug?

We still hold the folio locked, do submission then unlock.

You can check extent_writepage(), where at the entrance we check if the
folio is still locked.
Then inside extent_writepage_io() we do the submission, setting the
folio writeback inside submit_one_sector().
Eventually unlock the folio at the end of extent_writepage(), that's for
the uncompressed writes.

There are a lot of special handling for async submission (compression),
but it  still holds the folio locked, do compression and submission, and
unlock, just all in another thread (this case).

So it looks like something is wrong when transferring the ownership of
the page cache folios to the compression path, or some not properly
handled error path.

Unfortunately I'm not really able to reproduce the case using the
reproducer...

I've just tried to reproduce locally using the downloadable assets and
the kernel crashed ~ after 1 minute of running the attached C repro.

[   87.616440][ T9044] ------------[ cut here ]------------
[   87.617126][ T9044] kernel BUG at mm/page-writeback.c:3119!
[   87.619308][ T9044] Oops: invalid opcode: 0000 [#1] PREEMPT SMP KASAN PTI
[   87.620174][ T9044] CPU: 1 UID: 0 PID: 9044 Comm: kworker/u10:6 Not
tainted 6.12.0-syzkaller-08446-g228a1157fb9f #0

Here are the instructions I followed:
https://github.com/google/syzkaller/blob/master/docs/syzbot_assets.md#run-a-c-reproducer

Thanks for the confirmation.

I can reproduce it using the exact disk image (around 1min), but not 
inside my usual development VM (over 5min).

So it will a lot tricky to debug now...

Thanks,
Qu