Re: [syzbot] Monthly xfs report

Dave Chinner <david@xxxxxxxxxxxxx> · Tue, 11 Apr 2023 11:35:12 +1000

On Thu, Mar 30, 2023 at 02:58:43AM -0700, syzbot wrote:
> Hello xfs maintainers/developers,
> 
> This is a 30-day syzbot report for the xfs subsystem.
> All related reports/information can be found at:
> https://syzkaller.appspot.com/upstream/s/xfs
> 
> During the period, 5 new issues were detected and 0 were fixed.
> In total, 23 issues are still open and 15 have been fixed so far.
> 
> Some of the still happening issues:
> 
> Crashes Repro Title
> 327     Yes   INFO: task hung in xlog_grant_head_check
>               https://syzkaller.appspot.com/bug?extid=568245b88fbaedcb1959

[  501.289306][ T5098] XFS (loop0): Mounting V4 Filesystem 5e6273b8-2167-42bb-911b-418aa14a1261
[  501.299015][ T5098] XFS (loop0): Log size 128 blocks too small, minimum size is 2880 blocks
[  501.307608][ T5098] XFS (loop0): Log size out of supported range.
[  501.313866][ T5098] XFS (loop0): Continuing onwards, but if log hangs are experienced then please report this message in the bug report.

Syzbot doing something stupid - syzbot needs to stop testing the
deprecated and soon to be unsupported v4 filesystem format.

Invalid.

> 85      Yes   KASAN: stack-out-of-bounds Read in xfs_buf_lock
>               https://syzkaller.appspot.com/bug?extid=0bc698a422b5e4ac988c

Bisection result is garbage.

Looks like a race between dquot shrinker grabbing a dquot buffer to
write back a dquot and the dquot buffer being reclaimed before it is
submitted from the delwri list. Something is dropping a buffer
reference on the floor...

More investigation needed.

> 81      Yes   WARNING in xfs_qm_dqget_cache_insert
>               https://syzkaller.appspot.com/bug?extid=6ae213503fb12e87934f

That'll be an ENOMEM warning on radix tree insert.

No big deal, the code cleans up and retries the lookup/insert
process cleanly. Could just remove the warning.

Low priority, low severity.

> 47      Yes   WARNING in xfs_bmapi_convert_delalloc
>               https://syzkaller.appspot.com/bug?extid=53b443b5c64221ee8bad

Unexpected ENOSPC because syzbot has created a inconsistency between
superblock counters and the free space btrees.  Warning is expected
as it indicates user data loss is going to occur, doesn't happen in
typical production operation, generally requires malicious
corruption of the filesystem to trigger.

Not a bug, won't fix.

> 44      Yes   INFO: task hung in xfs_buf_item_unpin
>               https://syzkaller.appspot.com/bug?extid=3f083e9e08b726fcfba2

Yup, that's a deadlock on the superblock buffer.

xfs_sync_sb_buf() is called from an ioctl of some kind, gets stuck
in the log force waiting for iclogs to complete. xfs_sync_sb_buf()
holds the buffer across the transaction commit, so the sb buffer is
locked while waiting for the log force.

At just the wrong time, the filesystem gets shut down:

  [  484.946965][ T5959] syz-executor360: attempt to access beyond end of device
  [  484.946965][ T5959] loop0: rw=432129, sector=65536, nr_sectors = 64 limit=65536
  [  484.950756][   T52] XFS (loop0): log I/O error -5
  [  484.952017][   T52] XFS (loop0): Filesystem has been shut down due to log error (0x2).
  [  484.953902][   T52] XFS (loop0): Please unmount the filesystem and rectify the problem(s).
  [  714.735393][   T28] INFO: task kworker/1:1H:52 blocked for more than 143 seconds.

And the iclog IO completion tries to unpin and abort all the log
items in the current checkpoint. One of those is the superblock
buffer, and because this is an abort:

[  714.754433][   T28]  xfs_buf_lock+0x264/0xa68
[  714.755623][   T28]  xfs_buf_item_unpin+0x2c4/0xc18
[  714.756875][   T28]  xfs_trans_committed_bulk+0x2d8/0x73c
[  714.758236][   T28]  xlog_cil_committed+0x210/0xef8

The unpin code tries to lock the buffer to pass it through to IO
completion to mark it as failed.

Real deadlock, I think it might be able to occur on any synchronous
transaction commit that holds a buffer locked across it. No
immediate fix comes to mind right now. Can only occur on a journal
IO triggered shutdown, so not somethign that happens typically in
production systems.

Low priority, medium severity.

> 13      Yes   general protection fault in __xfs_free_extent
>               https://syzkaller.appspot.com/bug?extid=bfbc1eecdfb9b10e5792

Growfs issue. Looks like a NULL pag, which means the fsbno passed
to __xfs_free_extent() is invalid. Without looking further, this
looks like it's a corrupt AGF length or superblock size and this has
resulted in the calculated fsbno starting beyond the end of the last
AG that we are about to grow. That means the agno is beyond EOFS,
xfs_perag_get(agno) ends up NULL, and __xfs_free_extent() goes
splat.  Likely requires corruption to trigger.

Low priority, low severity.

> 5       Yes   KASAN: use-after-free Read in xfs_btree_lookup_get_block
>               https://syzkaller.appspot.com/bug?extid=7e9494b8b399902e994e

Recovery of reflink COW extents, we have a corrupted journal

   [   52.495566][ T5067] XFS (loop0): Mounting V5 Filesystem bfdc47fc-10d8-4eed-a562-11a831b3f791
   [   52.599681][ T5067] XFS (loop0): Torn write (CRC failure) detected at log block 0x180. Truncating head block from 0x200.
   [   52.636680][ T5067] XFS (loop0): Starting recovery (logdev: internal)

And then it looks to have a UAF on the refcountbt cursor that is
first initialised in xfs_refcount_recover_cow_leftovers(). Likely
tripping over a corrupted refcount btree of some kind. Probably one
for Darrick to look into.

Low priority, low severity.

-Dave.
-- 
Dave Chinner
david@xxxxxxxxxxxxx