On Thu, Oct 27, 2022 at 10:14:31AM -0700, Darrick J. Wong wrote: > From: Darrick J. Wong <djwong@xxxxxxxxxx> > > Just prior to committing the reflink code into upstream, the xfs > maintainer at the time requested that I find a way to shard the refcount > records into two domains -- one for records tracking shared extents, and > a second for tracking CoW staging extents. The idea here was to > minimize mount time CoW reclamation by pushing all the CoW records to > the right edge of the keyspace, and it was accomplished by setting the > upper bit in rc_startblock. We don't allow AGs to have more than 2^31 > blocks, so the bit was free. > > Unfortunately, this was a very late addition to the codebase, so most of > the refcount record processing code still treats rc_startblock as a u32 > and pays no attention to whether or not the upper bit (the cow flag) is > set. This is a weakness is theoretically exploitable, since we're not > fully validating the incoming metadata records. > > Fuzzing demonstrates practical exploits of this weakness. If the cow > flag of a node block key record is corrupted, a lookup operation can go > to the wrong record block and start returning records from the wrong > cow/shared domain. This causes the math to go all wrong (since cow > domain is still implicit in the upper bit of rc_startblock) and we can > crash the kernel by tricking xfs into jumping into a nonexistent AG and > tripping over xfs_perag_get(mp, <nonexistent AG>) returning NULL. > > To fix this, start tracking the domain as an explicit part of struct > xfs_refcount_irec, adjust all refcount functions to check the domain > of a returned record, and alter the function definitions to accept them > where necessary. > > Found by fuzzing keys[2].cowflag = add in xfs/464. > > Signed-off-by: Darrick J. Wong <djwong@xxxxxxxxxx> Couple of minor things. > @@ -169,12 +183,17 @@ xfs_refcount_update( > struct xfs_refcount_irec *irec) > { > union xfs_btree_rec rec; > + __u32 start; > int error; Why __u32 and not, say, u32 or uint32_t? u32 is used 10x more often than __u32 in the kernel code, and in XFS only seem to use the __ variants in UAPI structures. > @@ -364,6 +388,7 @@ xfs_refcount_split_extent( > error = -EFSCORRUPTED; > goto out_error; > } > + > if (rcext.rc_startblock == agbno || xfs_refc_next(&rcext) <= agbno) > return 0; > Random new whitespace? Other than that it looks good. Reviewed-by: Dave Chinner <dchinner@xxxxxxxxxx> -- Dave Chinner david@xxxxxxxxxxxxx