Hi Dave, On 2023-04-11 at 10:33:53 +1000, Dave Chinner wrote: > On Thu, Apr 06, 2023 at 10:34:02AM +0800, Pengfei Xu wrote: > > Hi Dave Chinner and xfs experts, > > > > Greeting! > > > > There is task hung in xlog_grant_head_check in v6.3-rc5 kernel. > > > > Platform: x86 platforms > > > > All detailed info: https://github.com/xupengfe/syzkaller_logs/tree/main/230405_094839_xlog_grant_head_check > > Syzkaller reproduced code: https://github.com/xupengfe/syzkaller_logs/blob/main/230405_094839_xlog_grant_head_check/repro.c > > Syzkaller analysis repro.report: https://github.com/xupengfe/syzkaller_logs/blob/main/230405_094839_xlog_grant_head_check/repro.report > > Syzkaller analysis repro.stats: https://github.com/xupengfe/syzkaller_logs/blob/main/230405_094839_xlog_grant_head_check/repro.stats > > Reproduced prog repro.prog: https://github.com/xupengfe/syzkaller_logs/blob/main/230405_094839_xlog_grant_head_check/repro.prog > > Kconfig: https://github.com/xupengfe/syzkaller_logs/blob/main/230405_094839_xlog_grant_head_check/kconfig_origin > > Bisect info: https://github.com/xupengfe/syzkaller_logs/blob/main/230405_094839_xlog_grant_head_check/bisect_info.log > > > > It could be reproduced in maximum 2100s. > > Bisected and found bad commit was: > > " > > fe08cc5044486096bfb5ce9d3db4e915e53281ea > > xfs: open code sb verifier feature checks > > " > > It's just the suspected commit, because reverted above commit on top of v6.3-rc5 > > kernel then made kernel failed, could not double confirm for the issue. > > > > " > > [ 24.818100] memfd_create() without MFD_EXEC nor MFD_NOEXEC_SEAL, pid=339 'systemd' > > [ 28.230533] loop0: detected capacity change from 0 to 65536 > > [ 28.232522] XFS (loop0): Deprecated V4 format (crc=0) will not be supported after September 2030. > > [ 28.233447] XFS (loop0): Mounting V10 Filesystem d28317a9-9e04-4f2a-be27-e55b4c413ff6 > > Yeah, there's the issue that the bisect found - has nothing to do > with the log hang. fe08cc5044486 allowed filesystem versions > 5 to > be mounted, prior to that it wasn't allowed. I think this was just a > simple oversight. > > Not a bit deal, everything is based on feature support checks and > not version numbers, so it's not a critical issue. > > Low severity, low priority, but something we should fix and push > back to stable kernels sooner rather than later. > Ah, this issue was found from somewhere else, not the target place, and bisect is rewarding instead of wasting your time. It's great and lucky this time! :) > > [ 28.234235] XFS (loop0): Log size 66 blocks too small, minimum size is 1968 blocks > > [ 28.234856] XFS (loop0): Log size out of supported range. > > [ 28.235289] XFS (loop0): Continuing onwards, but if log hangs are experienced then please report this message in the bug report. > > [ 28.239290] XFS (loop0): Starting recovery (logdev: internal) > > [ 28.240979] XFS (loop0): Ending recovery (logdev: internal) > > [ 300.150944] INFO: task repro:541 blocked for more than 147 seconds. > > [ 300.151523] Not tainted 6.3.0-rc5-7e364e56293b+ #1 > > [ 300.152102] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. > > [ 300.152716] task:repro state:D stack:0 pid:541 ppid:540 flags:0x00004004 > > [ 300.153373] Call Trace: > > [ 300.153580] <TASK> > > [ 300.153765] __schedule+0x40a/0xc30 > > [ 300.154078] schedule+0x5b/0xe0 > > [ 300.154349] xlog_grant_head_wait+0x53/0x3a0 > > [ 300.154715] xlog_grant_head_check+0x1a5/0x1c0 > > [ 300.155113] xfs_log_reserve+0x145/0x380 > > [ 300.155442] xfs_trans_reserve+0x226/0x270 > > [ 300.155780] xfs_trans_alloc+0x147/0x470 > > [ 300.156112] xfs_qm_qino_alloc+0xcf/0x510 > > This log hang is *not a bug*. It is -expected- given that syzbot is > screwing around with fuzzed V4 filesystems. I almost just threw this > report in the bin because I saw it was a V4 filesytsem being > mounted. > > That is, V5 filesystems will refuse to mount a filesystem with a log > that is too small, completely avoiding this sort of hang caused by > the log being way smaller than a transaction reservation (guaranteed > hang). But we cannot do the same thing for V4 filesystems, because > there were bugs in and inconsistencies between mkfs and the kernel > over the minimum valid log size. Hence when we hit a V4 filesystem > in that situation, we issue a warning and allow operation to > continue because that's historical V4 filesystem behaviour. > > This kernel issued the "log size too small" warning, and then there > was a log space hang which is entirely predictable and not a kernel > bug. syzbot is doing something stupid, syzbot needs to be taught not > to do stupid things. > Thanks for pointing out this syzkaller issue, I will send the problem to syzkaller and related syzkaller author. Thanks again! BR. -Pengfei > -Dave. > -- > Dave Chinner > david@xxxxxxxxxxxxx