On Thu, Apr 06, 2023 at 10:34:02AM +0800, Pengfei Xu wrote: > Hi Dave Chinner and xfs experts, > > Greeting! > > There is task hung in xlog_grant_head_check in v6.3-rc5 kernel. > > Platform: x86 platforms > > All detailed info: https://github.com/xupengfe/syzkaller_logs/tree/main/230405_094839_xlog_grant_head_check > Syzkaller reproduced code: https://github.com/xupengfe/syzkaller_logs/blob/main/230405_094839_xlog_grant_head_check/repro.c > Syzkaller analysis repro.report: https://github.com/xupengfe/syzkaller_logs/blob/main/230405_094839_xlog_grant_head_check/repro.report > Syzkaller analysis repro.stats: https://github.com/xupengfe/syzkaller_logs/blob/main/230405_094839_xlog_grant_head_check/repro.stats > Reproduced prog repro.prog: https://github.com/xupengfe/syzkaller_logs/blob/main/230405_094839_xlog_grant_head_check/repro.prog > Kconfig: https://github.com/xupengfe/syzkaller_logs/blob/main/230405_094839_xlog_grant_head_check/kconfig_origin > Bisect info: https://github.com/xupengfe/syzkaller_logs/blob/main/230405_094839_xlog_grant_head_check/bisect_info.log > > It could be reproduced in maximum 2100s. > Bisected and found bad commit was: > " > fe08cc5044486096bfb5ce9d3db4e915e53281ea > xfs: open code sb verifier feature checks > " > It's just the suspected commit, because reverted above commit on top of v6.3-rc5 > kernel then made kernel failed, could not double confirm for the issue. > > " > [ 24.818100] memfd_create() without MFD_EXEC nor MFD_NOEXEC_SEAL, pid=339 'systemd' > [ 28.230533] loop0: detected capacity change from 0 to 65536 > [ 28.232522] XFS (loop0): Deprecated V4 format (crc=0) will not be supported after September 2030. > [ 28.233447] XFS (loop0): Mounting V10 Filesystem d28317a9-9e04-4f2a-be27-e55b4c413ff6 Yeah, there's the issue that the bisect found - has nothing to do with the log hang. fe08cc5044486 allowed filesystem versions > 5 to be mounted, prior to that it wasn't allowed. I think this was just a simple oversight. Not a bit deal, everything is based on feature support checks and not version numbers, so it's not a critical issue. Low severity, low priority, but something we should fix and push back to stable kernels sooner rather than later. > [ 28.234235] XFS (loop0): Log size 66 blocks too small, minimum size is 1968 blocks > [ 28.234856] XFS (loop0): Log size out of supported range. > [ 28.235289] XFS (loop0): Continuing onwards, but if log hangs are experienced then please report this message in the bug report. > [ 28.239290] XFS (loop0): Starting recovery (logdev: internal) > [ 28.240979] XFS (loop0): Ending recovery (logdev: internal) > [ 300.150944] INFO: task repro:541 blocked for more than 147 seconds. > [ 300.151523] Not tainted 6.3.0-rc5-7e364e56293b+ #1 > [ 300.152102] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. > [ 300.152716] task:repro state:D stack:0 pid:541 ppid:540 flags:0x00004004 > [ 300.153373] Call Trace: > [ 300.153580] <TASK> > [ 300.153765] __schedule+0x40a/0xc30 > [ 300.154078] schedule+0x5b/0xe0 > [ 300.154349] xlog_grant_head_wait+0x53/0x3a0 > [ 300.154715] xlog_grant_head_check+0x1a5/0x1c0 > [ 300.155113] xfs_log_reserve+0x145/0x380 > [ 300.155442] xfs_trans_reserve+0x226/0x270 > [ 300.155780] xfs_trans_alloc+0x147/0x470 > [ 300.156112] xfs_qm_qino_alloc+0xcf/0x510 This log hang is *not a bug*. It is -expected- given that syzbot is screwing around with fuzzed V4 filesystems. I almost just threw this report in the bin because I saw it was a V4 filesytsem being mounted. That is, V5 filesystems will refuse to mount a filesystem with a log that is too small, completely avoiding this sort of hang caused by the log being way smaller than a transaction reservation (guaranteed hang). But we cannot do the same thing for V4 filesystems, because there were bugs in and inconsistencies between mkfs and the kernel over the minimum valid log size. Hence when we hit a V4 filesystem in that situation, we issue a warning and allow operation to continue because that's historical V4 filesystem behaviour. This kernel issued the "log size too small" warning, and then there was a log space hang which is entirely predictable and not a kernel bug. syzbot is doing something stupid, syzbot needs to be taught not to do stupid things. -Dave. -- Dave Chinner david@xxxxxxxxxxxxx