Re: [Syzkaller & bisect] There is task hung in xlog_grant_head_check in v6.3-rc5

Pengfei Xu <pengfei.xu@xxxxxxxxx> · Tue, 11 Apr 2023 16:15:20 +0800

Hi Dave,

On 2023-04-11 at 10:33:53 +1000, Dave Chinner wrote:
> On Thu, Apr 06, 2023 at 10:34:02AM +0800, Pengfei Xu wrote:
> > Hi Dave Chinner and xfs experts,
> > 
> > Greeting!
> > 
> > There is task hung in xlog_grant_head_check in v6.3-rc5 kernel.
> > 
> > Platform: x86 platforms
> > 
> > All detailed info: https://github.com/xupengfe/syzkaller_logs/tree/main/230405_094839_xlog_grant_head_check
> > Syzkaller reproduced code: https://github.com/xupengfe/syzkaller_logs/blob/main/230405_094839_xlog_grant_head_check/repro.c
> > Syzkaller analysis repro.report: https://github.com/xupengfe/syzkaller_logs/blob/main/230405_094839_xlog_grant_head_check/repro.report
> > Syzkaller analysis repro.stats: https://github.com/xupengfe/syzkaller_logs/blob/main/230405_094839_xlog_grant_head_check/repro.stats
> > Reproduced prog repro.prog: https://github.com/xupengfe/syzkaller_logs/blob/main/230405_094839_xlog_grant_head_check/repro.prog
> > Kconfig: https://github.com/xupengfe/syzkaller_logs/blob/main/230405_094839_xlog_grant_head_check/kconfig_origin
> > Bisect info: https://github.com/xupengfe/syzkaller_logs/blob/main/230405_094839_xlog_grant_head_check/bisect_info.log
> > 
> > It could be reproduced in maximum 2100s.
> > Bisected and found bad commit was:
> > "
> > fe08cc5044486096bfb5ce9d3db4e915e53281ea
> > xfs: open code sb verifier feature checks
> > "
> > It's just the suspected commit, because reverted above commit on top of v6.3-rc5
> > kernel then made kernel failed, could not double confirm for the issue.
> > 
> > "
> > [   24.818100] memfd_create() without MFD_EXEC nor MFD_NOEXEC_SEAL, pid=339 'systemd'
> > [   28.230533] loop0: detected capacity change from 0 to 65536
> > [   28.232522] XFS (loop0): Deprecated V4 format (crc=0) will not be supported after September 2030.
> > [   28.233447] XFS (loop0): Mounting V10 Filesystem d28317a9-9e04-4f2a-be27-e55b4c413ff6
> 
> Yeah, there's the issue that the bisect found - has nothing to do
> with the log hang. fe08cc5044486 allowed filesystem versions > 5 to
> be mounted, prior to that it wasn't allowed. I think this was just a
> simple oversight.
> 
> Not a bit deal, everything is based on feature support checks and
> not version numbers, so it's not a critical issue.
> 
> Low severity, low priority, but something we should fix and push
> back to stable kernels sooner rather than later.
> 
  Ah, this issue was found from somewhere else, not the target place, and
  bisect is rewarding instead of wasting your time.
  It's great and lucky this time!  :)

> > [   28.234235] XFS (loop0): Log size 66 blocks too small, minimum size is 1968 blocks
> > [   28.234856] XFS (loop0): Log size out of supported range.
> > [   28.235289] XFS (loop0): Continuing onwards, but if log hangs are experienced then please report this message in the bug report.
> > [   28.239290] XFS (loop0): Starting recovery (logdev: internal)
> > [   28.240979] XFS (loop0): Ending recovery (logdev: internal)
> > [  300.150944] INFO: task repro:541 blocked for more than 147 seconds.
> > [  300.151523]       Not tainted 6.3.0-rc5-7e364e56293b+ #1
> > [  300.152102] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> > [  300.152716] task:repro           state:D stack:0     pid:541   ppid:540    flags:0x00004004
> > [  300.153373] Call Trace:
> > [  300.153580]  <TASK>
> > [  300.153765]  __schedule+0x40a/0xc30
> > [  300.154078]  schedule+0x5b/0xe0
> > [  300.154349]  xlog_grant_head_wait+0x53/0x3a0
> > [  300.154715]  xlog_grant_head_check+0x1a5/0x1c0
> > [  300.155113]  xfs_log_reserve+0x145/0x380
> > [  300.155442]  xfs_trans_reserve+0x226/0x270
> > [  300.155780]  xfs_trans_alloc+0x147/0x470
> > [  300.156112]  xfs_qm_qino_alloc+0xcf/0x510
> 
> This log hang is *not a bug*. It is -expected- given that syzbot is
> screwing around with fuzzed V4 filesystems. I almost just threw this
> report in the bin because I saw it was a V4 filesytsem being
> mounted.
> 
> That is, V5 filesystems will refuse to mount a filesystem with a log
> that is too small, completely avoiding this sort of hang caused by
> the log being way smaller than a transaction reservation (guaranteed
> hang). But we cannot do the same thing for V4 filesystems, because
> there were bugs in and inconsistencies between mkfs and the kernel
> over the minimum valid log size. Hence when we hit a V4 filesystem
> in that situation, we issue a warning and allow operation to
> continue because that's historical V4 filesystem behaviour.
> 
> This kernel issued the "log size too small" warning, and then there
> was a log space hang which is entirely predictable and not a kernel
> bug. syzbot is doing something stupid, syzbot needs to be taught not
> to do stupid things.
> 
 Thanks for pointing out this syzkaller issue, I will send the problem to
 syzkaller and related syzkaller author.

 Thanks again!
 BR.
 -Pengfei

> -Dave.
> -- 
> Dave Chinner
> david@xxxxxxxxxxxxx