Re: [Syzkaller & bisect] There is task hung in xlog_grant_head_check in v6.3-rc5

Dave Chinner <david@xxxxxxxxxxxxx> · Tue, 11 Apr 2023 10:33:53 +1000

On Thu, Apr 06, 2023 at 10:34:02AM +0800, Pengfei Xu wrote:
> Hi Dave Chinner and xfs experts,
> 
> Greeting!
> 
> There is task hung in xlog_grant_head_check in v6.3-rc5 kernel.
> 
> Platform: x86 platforms
> 
> All detailed info: https://github.com/xupengfe/syzkaller_logs/tree/main/230405_094839_xlog_grant_head_check
> Syzkaller reproduced code: https://github.com/xupengfe/syzkaller_logs/blob/main/230405_094839_xlog_grant_head_check/repro.c
> Syzkaller analysis repro.report: https://github.com/xupengfe/syzkaller_logs/blob/main/230405_094839_xlog_grant_head_check/repro.report
> Syzkaller analysis repro.stats: https://github.com/xupengfe/syzkaller_logs/blob/main/230405_094839_xlog_grant_head_check/repro.stats
> Reproduced prog repro.prog: https://github.com/xupengfe/syzkaller_logs/blob/main/230405_094839_xlog_grant_head_check/repro.prog
> Kconfig: https://github.com/xupengfe/syzkaller_logs/blob/main/230405_094839_xlog_grant_head_check/kconfig_origin
> Bisect info: https://github.com/xupengfe/syzkaller_logs/blob/main/230405_094839_xlog_grant_head_check/bisect_info.log
> 
> It could be reproduced in maximum 2100s.
> Bisected and found bad commit was:
> "
> fe08cc5044486096bfb5ce9d3db4e915e53281ea
> xfs: open code sb verifier feature checks
> "
> It's just the suspected commit, because reverted above commit on top of v6.3-rc5
> kernel then made kernel failed, could not double confirm for the issue.
> 
> "
> [   24.818100] memfd_create() without MFD_EXEC nor MFD_NOEXEC_SEAL, pid=339 'systemd'
> [   28.230533] loop0: detected capacity change from 0 to 65536
> [   28.232522] XFS (loop0): Deprecated V4 format (crc=0) will not be supported after September 2030.
> [   28.233447] XFS (loop0): Mounting V10 Filesystem d28317a9-9e04-4f2a-be27-e55b4c413ff6

Yeah, there's the issue that the bisect found - has nothing to do
with the log hang. fe08cc5044486 allowed filesystem versions > 5 to
be mounted, prior to that it wasn't allowed. I think this was just a
simple oversight.

Not a bit deal, everything is based on feature support checks and
not version numbers, so it's not a critical issue.

Low severity, low priority, but something we should fix and push
back to stable kernels sooner rather than later.

> [   28.234235] XFS (loop0): Log size 66 blocks too small, minimum size is 1968 blocks
> [   28.234856] XFS (loop0): Log size out of supported range.
> [   28.235289] XFS (loop0): Continuing onwards, but if log hangs are experienced then please report this message in the bug report.
> [   28.239290] XFS (loop0): Starting recovery (logdev: internal)
> [   28.240979] XFS (loop0): Ending recovery (logdev: internal)
> [  300.150944] INFO: task repro:541 blocked for more than 147 seconds.
> [  300.151523]       Not tainted 6.3.0-rc5-7e364e56293b+ #1
> [  300.152102] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> [  300.152716] task:repro           state:D stack:0     pid:541   ppid:540    flags:0x00004004
> [  300.153373] Call Trace:
> [  300.153580]  <TASK>
> [  300.153765]  __schedule+0x40a/0xc30
> [  300.154078]  schedule+0x5b/0xe0
> [  300.154349]  xlog_grant_head_wait+0x53/0x3a0
> [  300.154715]  xlog_grant_head_check+0x1a5/0x1c0
> [  300.155113]  xfs_log_reserve+0x145/0x380
> [  300.155442]  xfs_trans_reserve+0x226/0x270
> [  300.155780]  xfs_trans_alloc+0x147/0x470
> [  300.156112]  xfs_qm_qino_alloc+0xcf/0x510

This log hang is *not a bug*. It is -expected- given that syzbot is
screwing around with fuzzed V4 filesystems. I almost just threw this
report in the bin because I saw it was a V4 filesytsem being
mounted.

That is, V5 filesystems will refuse to mount a filesystem with a log
that is too small, completely avoiding this sort of hang caused by
the log being way smaller than a transaction reservation (guaranteed
hang). But we cannot do the same thing for V4 filesystems, because
there were bugs in and inconsistencies between mkfs and the kernel
over the minimum valid log size. Hence when we hit a V4 filesystem
in that situation, we issue a warning and allow operation to
continue because that's historical V4 filesystem behaviour.

This kernel issued the "log size too small" warning, and then there
was a log space hang which is entirely predictable and not a kernel
bug. syzbot is doing something stupid, syzbot needs to be taught not
to do stupid things.

-Dave.
-- 
Dave Chinner
david@xxxxxxxxxxxxx