Re: [syzbot] [mm?] WARNING in zswap_folio_swapin

Nhat Pham <nphamcs@xxxxxxxxx> · Sun, 4 Feb 2024 19:48:38 -0800

On Sat, Feb 3, 2024 at 6:59 PM Chengming Zhou <chengming.zhou@xxxxxxxxx> wrote:
>
> On 2024/2/4 09:28, Nhat Pham wrote:
> > On Sat, Feb 3, 2024 at 12:37 PM syzbot
> > <syzbot+17a611d10af7d18a7092@xxxxxxxxxxxxxxxxxxxxxxxxx> wrote:
> >>
> >> Hello,
> >>
> >> syzbot found the following issue on:
> >>
> >> HEAD commit:    861c0981648f Merge tag 'jfs-6.8-rc3' of github.com:kleikam..
> >> git tree:       upstream
> >> console output: https://syzkaller.appspot.com/x/log.txt?x=174537bbe80000
> >> kernel config:  https://syzkaller.appspot.com/x/.config?x=b168fa511db3ca08
> >> dashboard link: https://syzkaller.appspot.com/bug?extid=17a611d10af7d18a7092
> >> compiler:       gcc (Debian 12.2.0-14) 12.2.0, GNU ld (GNU Binutils for Debian) 2.40
> >> userspace arch: i386
> >>
> >> Unfortunately, I don't have any reproducer for this issue yet.
> >>
> >> Downloadable assets:
> >> disk image (non-bootable): https://storage.googleapis.com/syzbot-assets/7bc7510fe41f/non_bootable_disk-861c0981.raw.xz
> >> vmlinux: https://storage.googleapis.com/syzbot-assets/b2b204c7b4a0/vmlinux-861c0981.xz
> >> kernel image: https://storage.googleapis.com/syzbot-assets/170ec316e557/bzImage-861c0981.xz
> >>
> >> IMPORTANT: if you fix the issue, please add the following tag to the commit:
> >> Reported-by: syzbot+17a611d10af7d18a7092@xxxxxxxxxxxxxxxxxxxxxxxxx
> >>
> >>  kcov_ioctl+0x4f/0x720 kernel/kcov.c:704
> >>  __do_compat_sys_ioctl+0x2bf/0x330 fs/ioctl.c:971
> >>  do_syscall_32_irqs_on arch/x86/entry/common.c:165 [inline]
> >>  __do_fast_syscall_32+0x79/0x110 arch/x86/entry/common.c:321
> >> page has been migrated, last migrate reason: compaction
> >> ------------[ cut here ]------------
> >> WARNING: CPU: 2 PID: 5104 at include/linux/memcontrol.h:775 folio_lruvec include/linux/memcontrol.h:775 [inline]
> >> WARNING: CPU: 2 PID: 5104 at include/linux/memcontrol.h:775 zswap_folio_swapin+0x47d/0x5a0 mm/zswap.c:381
> >> Modules linked in:
> >> CPU: 2 PID: 5104 Comm: syz-fuzzer Not tainted 6.8.0-rc2-syzkaller-00031-g861c0981648f #0
> >> Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.2-debian-1.16.2-1 04/01/2014
> >> RIP: 0010:folio_lruvec include/linux/memcontrol.h:775 [inline]
> >
> > Hmm looks like it's this line:
> > VM_WARN_ON_ONCE_FOLIO(!memcg && !mem_cgroup_disabled(), folio);
> >
> > Looks like memcg was cleared from the folio. Haven't looked too
> > closely yet, but this (and the "page has been migrated" line above)
> > suggests maybe there is some migration business going on -
> > mem_cgroup_migrate() clears the old folio's memcg_data (via
> > old->memcg_data = 0).
>
> Yeah, I think it's this case.
>
> >
> > Here's my theory (which could be wrong - someone please fact-check
> > me): swap_read_folio(), which precedes zswap_folio_swapin(), unlocks
>
> And another case is !page_allocated, the returned folio is unlocked, right?

I think you're correct. That said, it's probably fine to keep the
protection size if we find the folio in the swapcache anyway - IIUC,
we are not performing a swapin in that case (since !page_allocated
means no swap_read_folio() called), which is the scenario that the
heuristics cares about :)

IOW, something like this:

if (unlikely(page_allocated)) {
    zswap_folio_swapin(folio);
    swap_read_folio(folio, false, NULL);
}

make sense to me, both from the correctness POV, and the heuristics POV.

>
> > the folio. Could this be sufficient to allow for migration? If this is
>
> IMHO, folio locked is sufficient to avoid concurrent memcg migration.
>
> > the case, all we need to do is move this to above swap_read_folio(),
> > while the folio is still locked. __read_swap_cache_async() already
> > charges the folio to an memcg, so no need to wait till after
> > swap_read_page() anyway.
>
> Should we call zswap_folio_swapin() in the !page_allocated case?
>
> Thanks.