Re: [syzbot] [xfs?] INFO: task hung in xfs_ail_push_all_sync (2)

Aleksandr Nogikh <nogikh@xxxxxxxxxx> · Fri, 18 Oct 2024 12:13:33 +0200

Hi Dave,

On Thu, Oct 17, 2024 at 2:53 AM 'Dave Chinner' via syzkaller-bugs
<syzkaller-bugs@xxxxxxxxxxxxxxxx> wrote:
>
> On Wed, Oct 16, 2024 at 04:22:27PM -0700, syzbot wrote:
> > Hello,
> >
> > syzbot found the following issue on:
> >
> > HEAD commit:    09f6b0c8904b Merge tag 'linux_kselftest-fixes-6.12-rc3' of..
> > git tree:       upstream
> > console output: https://syzkaller.appspot.com/x/log.txt?x=14af3fd0580000
> > kernel config:  https://syzkaller.appspot.com/x/.config?x=7cd9e7e4a8a0a15b
> > dashboard link: https://syzkaller.appspot.com/bug?extid=611be8174be36ca5dbc9
> > compiler:       Debian clang version 15.0.6, GNU ld (GNU Binutils for Debian) 2.40
> > syz repro:      https://syzkaller.appspot.com/x/repro.syz?x=16c7705f980000
> > C reproducer:   https://syzkaller.appspot.com/x/repro.c?x=14d2fb27980000
>

It's better to just leave the issue open until syzbot actually stops
triggering it. Otherwise, after every "#syz invalid", the crash will
be eventually seen again and re-sent to the mailing lists.

In the other email you mentioned
"/sys/fs/xfs/<dev>/error/metadata/EIO/max_retries" as the only way to
prevent this hang. Must max_retries be set every time after xfs is
mounted? Or is it possible to somehow preconfigure it once at VM boot
and then no longer worry about it during fuzzing?

> I explained this last time syzbot triggered this: this is a syzbot
> configuration problem, not a filesystem bug.
>
> [   96.418071][ T5112] XFS (loop0): Mounting V5 Filesystem c496e05e-540d-4c72-b591-04d79d8b4eeb
> [   96.593743][ T5112] XFS (loop0): Ending clean mount
> [   96.791357][ T5112] loop0: detected capacity change from 32768 to 0
> [   96.814808][ T5127] xfsaild/loop0: attempt to access beyond end of device
> [   96.814808][ T5127] loop0: rw=4097, sector=2, nr_sectors = 1 limit=0
> [   96.851235][ T5127] xfsaild/loop0: attempt to access beyond end of device
> [   96.851235][ T5127] loop0: rw=4097, sector=24, nr_sectors = 8 limit=0
> [   96.860284][    T9] XFS (loop0): metadata I/O error in "xfs_buf_ioerror_alert_ratelimited+0x7b/0x1e0" at daddr 0x2 len 1 error 5
> [   96.886045][    T9] kworker/0:1: attempt to access beyond end of device
> [   96.886045][    T9] loop0: rw=4097, sector=2, nr_sectors = 1 limit=0
> [   96.900489][ T5127] xfsaild/loop0: attempt to access beyond end of device
> [   96.900489][ T5127] loop0: rw=4097, sector=32, nr_sectors = 8 limit=0
> [   96.932892][    T9] kworker/0:1: attempt to access beyond end of device
> [   96.932892][    T9] loop0: rw=4097, sector=24, nr_sectors = 8 limit=0
> [   96.940364][ T5127] xfsaild/loop0: attempt to access beyond end of device
> [   96.940364][ T5127] loop0: rw=4097, sector=8832, nr_sectors = 64 limit=0
> .....
>
> And so it goes until something tries to freeze the filesystem and
> gets stuck waiting for writeback of metadata that is not making
> progress because XFS defaults to -retry metadata write errors
> forever- until the filesystem is shut down.
>
> If the user expects an XFS filesystem to fail fast when they
> accidentally shrink the block device under a mounted filesytem, then
> they need to configure XFS to fail metadata IO fast. Otherwise
> metadata will remain dirty and be retried until the filesystem is
> shut down or the error behaviour is reconfigured.
>
> Please fix your syzbot configurations and/or tests that screw with
> the block device under filesystems to configure XFS filesystems to
> fail fast so that these tests no longer generate unwanted noise.
>
> #syz invalid
>
> -Dave.
> --
> Dave Chinner
> david@xxxxxxxxxxxxx
>

-- 
Aleksandr