Hi Folks, This set of patches is a result of a recent syzkaller bug report on v4 filesystems. It tripped over a corrupt AGFL count field in the AGF, but the failure didn't manifest until a crash processing extent free operations. It turns out the crash was due to deferred AGFL freeing being passed a NULLAGBNO as the block to free - this resulted in an invalid filesystem block value which translated to an invalid AGNO which then failed a perag lookup in the defered extent freeing code. While the oops was a result of a change in the 6.4-rc1 merge window, this is not the cause of the problem - the problem has always been there for V4 filesystems. However, V5 filesystems would have detected this specific corruption and would not have failed at all. There are other failures to detect bad extent ranges being freed so the v5 code needs improvement, too. The V5 verification is not done in the AGF/AGFL verifiers. It is deferred to initialising the perag from the AGF after it has been read from disk and the verifiers have been run. It is not documented in the code that the verifiers intentionally don't check free list indexes, even though they could in most situations. The verification for v5 filesystems was introduced back when we discovered that there was an issue with the on-disk sizing of the v5 AGFL header. Code was added to detect the mismatch at runtime instead of in the verifier so that it could be corrected without shutting down the filesystem. It does so by resetting the AGFL indexes and the AGFL in the first transaction that needs to fix the free list. The old contents of the AGFL are leaked, but the system continues to function without any further issues. For some reason, this verification was not extented to v4 filesystems, even though it is still necessary to catch the same corruptions on V4 filesystems. The first patch in the series simply runs the existing v5 AGFL index verification on v4 filesystems and triggers the AGFL reset code in the same way. This is all that is necessary to "fix" the syzkaller reproducer. However, one of the things the verification does not do is determine if the agbno we pull from the AGFL in xfs_alloc_get_freelist() is valid. The AGFL read verifier for V5 filesystems checks that each entry is either in range or NULLAGBNO, but that isn't sufficient. V4 filesystems can't even do this, because for a long time mkfs didn't initialise AGFLs to NULLAGBNO so can contain anything at all. Hence we have to check the AGBNO we pull from the AGFL as being in a valid range. NULLAGBNO is not valid - that's what triggered the oops later on. If we caught invalid AGBNOs in xfs_alloc_get_freelist() and returned -EFSCORRUPTED, the system would not have crashed. The second patch addresses this. Finally, nothing checked that the fsbno we are adding to xefi for deferred freeing is actually valid. This fsbno can come from any other structure in the filesystem, and we magically trust that it is valid despite many paths into this code not directly verifying the extent we are asking to be freed is valid. Freeing a bogus extent will, at best, result in a free space btree with a corrupt record in it that we trip over later and shut down the filesystem. Worst case, it will crash the system (as per the syzkaller report). The third patch in the series hardens the extent freeing code to reject bogus extent ranges at the time the code attempts to queue them for freeing. It also adds error handling to all the code that defers extent free operations so that the filesystem can be shut down immediately from the context that holds the bad extent, rather than having it get passes silently to later operations that end up tripping over it. IOWs, all three patches are needed to close the holes that lead to the system crashing rather than detecting AGF/AGFL corruption and handling it cleanly. Cheers, Dave.