[PATCH 0/3] xfs: improve AGF/AGFL verification

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi Folks,

This set of patches is a result of a recent syzkaller bug report on
v4 filesystems. It tripped over a corrupt AGFL count field in the
AGF, but the failure didn't manifest until a crash processing extent
free operations.

It turns out the crash was due to deferred AGFL freeing being passed
a NULLAGBNO as the block to free - this resulted in an invalid
filesystem block value which translated to an invalid AGNO which
then failed a perag lookup in the defered extent freeing code.

While the oops was a result of a change in the 6.4-rc1 merge window,
this is not the cause of the problem - the problem has always been
there for V4 filesystems. However, V5 filesystems would have
detected this specific corruption and would not have failed at all.
There are other failures to detect bad extent ranges being
freed so the v5 code needs improvement, too.

The V5 verification is not done in the AGF/AGFL verifiers.
It is deferred to initialising the perag from the AGF after it has
been read from disk and the verifiers have been run. It is not
documented in the code that the verifiers intentionally don't check
free list indexes, even though they could in most situations.

The verification for v5 filesystems was introduced back when we
discovered that there was an issue with the on-disk sizing of the v5
AGFL header. Code was added to detect the mismatch at runtime
instead of in the verifier so that it could be corrected without
shutting down the filesystem. It does so by resetting the AGFL
indexes and the AGFL in the first transaction that needs to fix the
free list. The old contents of the AGFL are leaked, but the system
continues to function without any further issues.

For some reason, this verification was not extented to v4
filesystems, even though it is still necessary to catch the same
corruptions on V4 filesystems. The first patch in the series simply
runs the existing v5 AGFL index verification on v4 filesystems and
triggers the AGFL reset code in the same way. This is all that is
necessary to "fix" the syzkaller reproducer.

However, one of the things the verification does not do is determine
if the agbno we pull from the AGFL in xfs_alloc_get_freelist() is
valid. The AGFL read verifier for V5 filesystems checks that each
entry is either in range or NULLAGBNO, but that isn't sufficient. V4
filesystems can't even do this, because for a long time mkfs didn't
initialise AGFLs to NULLAGBNO so can contain anything at all.

Hence we have to check the AGBNO we pull from the AGFL as being in a
valid range. NULLAGBNO is not valid - that's what triggered the oops
later on.  If we caught invalid AGBNOs in xfs_alloc_get_freelist()
and returned -EFSCORRUPTED, the system would not have crashed. The
second patch addresses this.

Finally, nothing checked that the fsbno we are adding to xefi for
deferred freeing is actually valid. This fsbno can come from any
other structure in the filesystem, and we magically trust that it is
valid despite many paths into this code not directly verifying the
extent we are asking to be freed is valid. Freeing a bogus extent
will, at best, result in a free space btree with a corrupt record in
it that we trip over later and shut down the filesystem. Worst
case, it will crash the system (as per the syzkaller report).

The third patch in the series hardens the extent freeing code to
reject bogus extent ranges at the time the code attempts to queue
them for freeing. It also adds error handling to all the code that
defers extent free operations so that the filesystem can be shut
down immediately from the context that holds the bad extent, rather
than having it get passes silently to later operations that end up
tripping over it.

IOWs, all three patches are needed to close the holes that lead to
the system crashing rather than detecting AGF/AGFL corruption and
handling it cleanly.

Cheers,

Dave.




[Index of Archives]     [XFS Filesystem Development (older mail)]     [Linux Filesystem Development]     [Linux Audio Users]     [Yosemite Trails]     [Linux Kernel]     [Linux RAID]     [Linux SCSI]


  Powered by Linux