On Fri, Oct 28, 2022 at 08:03:07AM +1100, Dave Chinner wrote: > On Thu, Oct 27, 2022 at 10:14:31AM -0700, Darrick J. Wong wrote: > > From: Darrick J. Wong <djwong@xxxxxxxxxx> > > > > Just prior to committing the reflink code into upstream, the xfs > > maintainer at the time requested that I find a way to shard the refcount > > records into two domains -- one for records tracking shared extents, and > > a second for tracking CoW staging extents. The idea here was to > > minimize mount time CoW reclamation by pushing all the CoW records to > > the right edge of the keyspace, and it was accomplished by setting the > > upper bit in rc_startblock. We don't allow AGs to have more than 2^31 > > blocks, so the bit was free. > > > > Unfortunately, this was a very late addition to the codebase, so most of > > the refcount record processing code still treats rc_startblock as a u32 > > and pays no attention to whether or not the upper bit (the cow flag) is > > set. This is a weakness is theoretically exploitable, since we're not > > fully validating the incoming metadata records. > > > > Fuzzing demonstrates practical exploits of this weakness. If the cow > > flag of a node block key record is corrupted, a lookup operation can go > > to the wrong record block and start returning records from the wrong > > cow/shared domain. This causes the math to go all wrong (since cow > > domain is still implicit in the upper bit of rc_startblock) and we can > > crash the kernel by tricking xfs into jumping into a nonexistent AG and > > tripping over xfs_perag_get(mp, <nonexistent AG>) returning NULL. > > > > To fix this, start tracking the domain as an explicit part of struct > > xfs_refcount_irec, adjust all refcount functions to check the domain > > of a returned record, and alter the function definitions to accept them > > where necessary. > > > > Found by fuzzing keys[2].cowflag = add in xfs/464. > > > > Signed-off-by: Darrick J. Wong <djwong@xxxxxxxxxx> > > Couple of minor things. > > > @@ -169,12 +183,17 @@ xfs_refcount_update( > > struct xfs_refcount_irec *irec) > > { > > union xfs_btree_rec rec; > > + __u32 start; > > int error; > > Why __u32 and not, say, u32 or uint32_t? u32 is used 10x more often > than __u32 in the kernel code, and in XFS only seem to use the __ > variants in UAPI structures. Force of habit, I guess. I'll rework it as uint32_t. > > @@ -364,6 +388,7 @@ xfs_refcount_split_extent( > > error = -EFSCORRUPTED; > > goto out_error; > > } > > + > > if (rcext.rc_startblock == agbno || xfs_refc_next(&rcext) <= agbno) > > return 0; > > > > Random new whitespace? Oops. I overlooked that when I was slicing and dicing yesterday. > Other than that it looks good. > > Reviewed-by: Dave Chinner <dchinner@xxxxxxxxxx> Thanks! --D > -- > Dave Chinner > david@xxxxxxxxxxxxx