On Tue, Apr 11, 2023 at 11:35:12AM +1000, Dave Chinner wrote: > On Thu, Mar 30, 2023 at 02:58:43AM -0700, syzbot wrote: > > Hello xfs maintainers/developers, > > > > This is a 30-day syzbot report for the xfs subsystem. > > All related reports/information can be found at: > > https://syzkaller.appspot.com/upstream/s/xfs > > > > During the period, 5 new issues were detected and 0 were fixed. > > In total, 23 issues are still open and 15 have been fixed so far. > > > > Some of the still happening issues: > > > > Crashes Repro Title > > 327 Yes INFO: task hung in xlog_grant_head_check > > https://syzkaller.appspot.com/bug?extid=568245b88fbaedcb1959 > > [ 501.289306][ T5098] XFS (loop0): Mounting V4 Filesystem 5e6273b8-2167-42bb-911b-418aa14a1261 > [ 501.299015][ T5098] XFS (loop0): Log size 128 blocks too small, minimum size is 2880 blocks > [ 501.307608][ T5098] XFS (loop0): Log size out of supported range. > [ 501.313866][ T5098] XFS (loop0): Continuing onwards, but if log hangs are experienced then please report this message in the bug report. > > Syzbot doing something stupid - syzbot needs to stop testing the > deprecated and soon to be unsupported v4 filesystem format. > > Invalid. > > > 85 Yes KASAN: stack-out-of-bounds Read in xfs_buf_lock > > https://syzkaller.appspot.com/bug?extid=0bc698a422b5e4ac988c > > Bisection result is garbage. > > Looks like a race between dquot shrinker grabbing a dquot buffer to > write back a dquot and the dquot buffer being reclaimed before it is > submitted from the delwri list. Something is dropping a buffer > reference on the floor... > > More investigation needed. > > > 81 Yes WARNING in xfs_qm_dqget_cache_insert > > https://syzkaller.appspot.com/bug?extid=6ae213503fb12e87934f > > That'll be an ENOMEM warning on radix tree insert. > > No big deal, the code cleans up and retries the lookup/insert > process cleanly. Could just remove the warning. > > Low priority, low severity. > > > 47 Yes WARNING in xfs_bmapi_convert_delalloc > > https://syzkaller.appspot.com/bug?extid=53b443b5c64221ee8bad > > Unexpected ENOSPC because syzbot has created a inconsistency between > superblock counters and the free space btrees. Warning is expected > as it indicates user data loss is going to occur, doesn't happen in > typical production operation, generally requires malicious > corruption of the filesystem to trigger. > > Not a bug, won't fix. > > > 44 Yes INFO: task hung in xfs_buf_item_unpin > > https://syzkaller.appspot.com/bug?extid=3f083e9e08b726fcfba2 > > Yup, that's a deadlock on the superblock buffer. > > xfs_sync_sb_buf() is called from an ioctl of some kind, gets stuck > in the log force waiting for iclogs to complete. xfs_sync_sb_buf() > holds the buffer across the transaction commit, so the sb buffer is > locked while waiting for the log force. > > At just the wrong time, the filesystem gets shut down: > > [ 484.946965][ T5959] syz-executor360: attempt to access beyond end of device > [ 484.946965][ T5959] loop0: rw=432129, sector=65536, nr_sectors = 64 limit=65536 > [ 484.950756][ T52] XFS (loop0): log I/O error -5 > [ 484.952017][ T52] XFS (loop0): Filesystem has been shut down due to log error (0x2). > [ 484.953902][ T52] XFS (loop0): Please unmount the filesystem and rectify the problem(s). > [ 714.735393][ T28] INFO: task kworker/1:1H:52 blocked for more than 143 seconds. > > And the iclog IO completion tries to unpin and abort all the log > items in the current checkpoint. One of those is the superblock > buffer, and because this is an abort: > > [ 714.754433][ T28] xfs_buf_lock+0x264/0xa68 > [ 714.755623][ T28] xfs_buf_item_unpin+0x2c4/0xc18 > [ 714.756875][ T28] xfs_trans_committed_bulk+0x2d8/0x73c > [ 714.758236][ T28] xlog_cil_committed+0x210/0xef8 > > The unpin code tries to lock the buffer to pass it through to IO > completion to mark it as failed. > > Real deadlock, I think it might be able to occur on any synchronous > transaction commit that holds a buffer locked across it. No > immediate fix comes to mind right now. Can only occur on a journal > IO triggered shutdown, so not somethign that happens typically in > production systems. Force log, then xfs_ail_push_all_sync()? It's SETLABEL, who cares how slow it is? > Low priority, medium severity. > > > > 13 Yes general protection fault in __xfs_free_extent > > https://syzkaller.appspot.com/bug?extid=bfbc1eecdfb9b10e5792 > > Growfs issue. Looks like a NULL pag, which means the fsbno passed > to __xfs_free_extent() is invalid. Without looking further, this > looks like it's a corrupt AGF length or superblock size and this has > resulted in the calculated fsbno starting beyond the end of the last > AG that we are about to grow. That means the agno is beyond EOFS, > xfs_perag_get(agno) ends up NULL, and __xfs_free_extent() goes > splat. Likely requires corruption to trigger. > > Low priority, low severity. I've been wondering for quite a while if the code that creates those defer items ought to be shutting down the fs if they can't get a perag to stuff in the intent. xfs_perag_intent_get seems like a reasonable place to shut down the fs with a corruption warning if someone feeds in a totally garbage fsblock range. > > 5 Yes KASAN: use-after-free Read in xfs_btree_lookup_get_block > > https://syzkaller.appspot.com/bug?extid=7e9494b8b399902e994e > > Recovery of reflink COW extents, we have a corrupted journal > > [ 52.495566][ T5067] XFS (loop0): Mounting V5 Filesystem bfdc47fc-10d8-4eed-a562-11a831b3f791 > [ 52.599681][ T5067] XFS (loop0): Torn write (CRC failure) detected at log block 0x180. Truncating head block from 0x200. > [ 52.636680][ T5067] XFS (loop0): Starting recovery (logdev: internal) > > And then it looks to have a UAF on the refcountbt cursor that is > first initialised in xfs_refcount_recover_cow_leftovers(). Likely > tripping over a corrupted refcount btree of some kind. Probably one > for Darrick to look into. Somehow the bogus refcount level field in the AGF is getting past the verifiers. I'll look into this later. --D > Low priority, low severity. > > -Dave. > -- > Dave Chinner > david@xxxxxxxxxxxxx