On Wed, Nov 02, 2022 at 06:08:12PM -0700, Darrick J. Wong wrote: > On Wed, Nov 02, 2022 at 01:16:26PM +1100, Dave Chinner wrote: > > On Sun, Oct 02, 2022 at 11:20:16AM -0700, Darrick J. Wong wrote: > > > From: Darrick J. Wong <djwong@xxxxxxxxxx> > > > > > > For keyspace fullness scans, we want to be able to mask off the parts of > > > the key that we don't care about. For most btree types we /do/ want the > > > full keyspace, but for checking that a given space usage also has a full > > > complement of rmapbt records (even if different/multiple owners) we need > > > this masking so that we only track sparseness of rm_startblock, not the > > > whole keyspace (which is extremely sparse). > > > > > > Signed-off-by: Darrick J. Wong <djwong@xxxxxxxxxx> > > > --- > > .... > > > diff --git a/fs/xfs/libxfs/xfs_btree.c b/fs/xfs/libxfs/xfs_btree.c > > > index edea6db8d8e4..6fbce2f3c17e 100644 > > > --- a/fs/xfs/libxfs/xfs_btree.c > > > +++ b/fs/xfs/libxfs/xfs_btree.c > > > @@ -5020,12 +5020,33 @@ struct xfs_btree_scan_keyfill { > > > union xfs_btree_key start_key; > > > union xfs_btree_key end_key; > > > > > > + /* Mask for key comparisons, if desired. */ > > > + union xfs_btree_key *key_mask; > > > > How does this mask work? i.e. the way it is supposed to be used it > > not documented in either the commit message or the code.... > > When I merge all of this into _diff_two_keys, I'll add this to its > description: > > "Normally, each btree type's _diff_two_keys implementation will use all > available btree key fields to compare the given keys. However, some > callers may prefer to ignore some part of the btree record keyspace when > performing the comparison. > > These callers should create a union xfs_btree_key object, set the fields > that *should* be a part of the comparison to any nonzero value, and > leave the rest zeroed. That object should be passed in as the @key_mask > value." > > For a concrete example, take the rmap space scanning function below. If > we only want to know if a certain range of physical blocks has zero rmap > records, enough rmap records to account for every block in the range, or > some records in between, we'd initialize the key mask as follows: > > union xfs_btree_key km = { > .rmap.rm_startblock = 1, > }; > > and the _scan_keyfill function will only look that far into the key > comparisons. > > > > > > +STATIC int64_t > > > +xfs_btree_diff_two_masked_keys( > > > + struct xfs_btree_cur *cur, > > > + const union xfs_btree_key *key1, > > > + const union xfs_btree_key *key2, > > > + const union xfs_btree_key *mask) > > > +{ > > > + union xfs_btree_key mk1, mk2; > > > + > > > + if (likely(!mask)) > > > + return cur->bc_ops->diff_two_keys(cur, key1, key2); > > > + > > > + cur->bc_ops->mask_key(cur, &mk1, key1, mask); > > > + cur->bc_ops->mask_key(cur, &mk2, key2, mask); > > > + > > > + return cur->bc_ops->diff_two_keys(cur, &mk1, &mk2); > > > +} > > > > This seems .... very abstract. > > Yes, I've gone unusually deep into database theory here... > > > Why not just add a mask pointer to xfs_btree_diff_two_keys(), > > and in each of the btree implementations of ->diff_two_keys() > > change them to: > > > > if (mask) { > > /* mask keys */ > > } > > /* run existing diff on two keys */ > > > > That gets rid of all these function pointer calls, and we only need > > a single "diff two keys" api to be defined... > > Ok, I'll look into combining mask_key into diff_two_keys. > > > > diff --git a/fs/xfs/libxfs/xfs_btree.h b/fs/xfs/libxfs/xfs_btree.h > > > index 58a05f0d1f1b..99baa8283049 100644 > > > --- a/fs/xfs/libxfs/xfs_btree.h > > > +++ b/fs/xfs/libxfs/xfs_btree.h > > > @@ -158,10 +158,17 @@ struct xfs_btree_ops { > > > const union xfs_btree_rec *r1, > > > const union xfs_btree_rec *r2); > > > > > > + /* mask a key for us */ > > > + void (*mask_key)(struct xfs_btree_cur *cur, > > > + union xfs_btree_key *out_key, > > > + const union xfs_btree_key *in_key, > > > + const union xfs_btree_key *mask); > > > + > > > /* decide if there's a gap in the keyspace between two keys */ > > > bool (*has_key_gap)(struct xfs_btree_cur *cur, > > > const union xfs_btree_key *key1, > > > - const union xfs_btree_key *key2); > > > + const union xfs_btree_key *key2, > > > + const union xfs_btree_key *mask); > > > }; > > > > > > /* > > > @@ -552,6 +559,7 @@ typedef bool (*xfs_btree_key_gap_fn)(struct xfs_btree_cur *cur, > > > int xfs_btree_scan_keyfill(struct xfs_btree_cur *cur, > > > const union xfs_btree_irec *low, > > > const union xfs_btree_irec *high, > > > + const union xfs_btree_irec *mask, > > > enum xfs_btree_keyfill *outcome); > > > > > > bool xfs_btree_has_more_records(struct xfs_btree_cur *cur); > > > diff --git a/fs/xfs/libxfs/xfs_ialloc_btree.c b/fs/xfs/libxfs/xfs_ialloc_btree.c > > > index fd48b95b4f4e..d429ca8d9dd8 100644 > > > --- a/fs/xfs/libxfs/xfs_ialloc_btree.c > > > +++ b/fs/xfs/libxfs/xfs_ialloc_btree.c > > > @@ -384,11 +384,14 @@ STATIC bool > > > xfs_inobt_has_key_gap( > > > struct xfs_btree_cur *cur, > > > const union xfs_btree_key *key1, > > > - const union xfs_btree_key *key2) > > > + const union xfs_btree_key *key2, > > > + const union xfs_btree_key *mask) > > > { > > > xfs_agino_t next; > > > > > > - next = be32_to_cpu(key1->inobt.ir_startino) + XFS_INODES_PER_CHUNK; > > > + ASSERT(!mask || mask->inobt.ir_startino); > > > + > > > + next = be32_to_cpu(key1->inobt.ir_startino) + 1; > > > return next != be32_to_cpu(key2->inobt.ir_startino); > > > } > > > > I think you just fixed the issue I noticed in the last patch.... > > Oops, I clearly committed this to the wrong patch. Sorry about that. > > > > diff --git a/fs/xfs/libxfs/xfs_rmap.c b/fs/xfs/libxfs/xfs_rmap.c > > > index 08d47cbf4697..4c123b6dd080 100644 > > > --- a/fs/xfs/libxfs/xfs_rmap.c > > > +++ b/fs/xfs/libxfs/xfs_rmap.c > > > @@ -2685,13 +2685,18 @@ xfs_rmap_scan_keyfill( > > > { > > > union xfs_btree_irec low; > > > union xfs_btree_irec high; > > > + union xfs_btree_irec mask; > > > + > > > + /* Only care about space scans here */ > > > + memset(&mask, 0, sizeof(low)); > > > > sizeof(mask)? > > Yep. > > > > diff --git a/fs/xfs/libxfs/xfs_rmap_btree.c b/fs/xfs/libxfs/xfs_rmap_btree.c > > > index d64143a842ce..9ca60f709c4b 100644 > > > --- a/fs/xfs/libxfs/xfs_rmap_btree.c > > > +++ b/fs/xfs/libxfs/xfs_rmap_btree.c > > > @@ -433,16 +433,55 @@ xfs_rmapbt_recs_inorder( > > > return 0; > > > } > > > > > > +STATIC void > > > +xfs_rmapbt_mask_key( > > > + struct xfs_btree_cur *cur, > > > + union xfs_btree_key *out_key, > > > + const union xfs_btree_key *in_key, > > > + const union xfs_btree_key *mask) > > > +{ > > > + memset(out_key, 0, sizeof(union xfs_btree_key)); > > > + > > > + if (mask->rmap.rm_startblock) > > > + out_key->rmap.rm_startblock = in_key->rmap.rm_startblock; > > > + if (mask->rmap.rm_owner) > > > + out_key->rmap.rm_owner = in_key->rmap.rm_owner; > > > + if (mask->rmap.rm_offset) > > > + out_key->rmap.rm_offset = in_key->rmap.rm_offset; > > > +} > > > > So the mask fields are just used as boolean state to select what > > should be copied into the masked key? > > Yes. A zeroed keymask field will be ignored, a nonzero keymask field > will be retained. > > > > + > > > STATIC bool > > > xfs_rmapbt_has_key_gap( > > > struct xfs_btree_cur *cur, > > > const union xfs_btree_key *key1, > > > - const union xfs_btree_key *key2) > > > + const union xfs_btree_key *key2, > > > + const union xfs_btree_key *mask) > > > { > > > - xfs_agblock_t next; > > > + bool reflink = xfs_has_reflink(cur->bc_mp); > > > + uint64_t x, y; > > > > > > - next = be32_to_cpu(key1->rmap.rm_startblock) + 1; > > > - return next != be32_to_cpu(key2->rmap.rm_startblock); > > > + if (mask->rmap.rm_offset) { > > > + x = be64_to_cpu(key1->rmap.rm_offset) + 1; > > > + y = be64_to_cpu(key2->rmap.rm_offset); > > > + if ((reflink && x < y) || (!reflink && x != y)) > > > + return true; > > > + } > > > + > > > + if (mask->rmap.rm_owner) { > > > + x = be64_to_cpu(key1->rmap.rm_owner) + 1; > > > + y = be64_to_cpu(key2->rmap.rm_owner); > > > + if ((reflink && x < y) || (!reflink && x != y)) > > > + return true; > > > + } > > > + > > > + if (mask->rmap.rm_startblock) { > > > + x = be32_to_cpu(key1->rmap.rm_startblock) + 1; > > > + y = be32_to_cpu(key2->rmap.rm_startblock); > > > + if ((reflink && x < y) || (!reflink && x != y)) > > > + return true; > > > + } > > > + > > > + return false; > > > > Urk. That needs a comment explaining what all the mystery reflink > > logic is doing. I decided to fix this in the "_scan_keyfill" patch by changing the ->keys_contiguous function to return if the keys are contiguous, have a gap, or overlap. That way, I could make _has_records detect improperly overlapping records and return EFSCORRUPTED. Having done that, it's no longer necessary to have all this reflink logic to handle rmaps for shared blocks. Then I thought about the complexity of this patch, and realized that there's only one user of the btree key masks, and that is xfs_rmapbt_has_records. That single user masks off everything in the xfs_btree_key except for rm_startblock. All other callers do not provide masks at all. IOWs, I could avoid introducing all this untestable complexity by dropping this patch entirely. That would leave the function incomplete, but that makes things simpler for now because I no longer have to worry about making the comparison handle numeric gaps in the keyspace where it's never possible to have keys (e.g. detecting records for a bmbt block and skipping the offset comparison). So. I'm dropping this patch. --D > > It also needs to explain the order of precedence on > > the mask checks and why that order is important (or not!). > > For the purpose of comparing two keys to decide if there's a gap in the > keyspace, we do the comparisons in order of most sensitive to least > sensitive. This is the opposite order of diff_two_keys -- if two keys > have the same startblock and inode but discontiguous file offsets, > there's a gap. If two keys have the same startblock but different > owners, there's a gap regardless of the offset. What that means is a > bit murky, since the only user of this functionality passes a mask so > that we only compare the perag parts fo the keys. > > Except for the masking, I think the comparison logic all got committed > to the wrong patch. :( > > --D > > > Cheers, > > > > Dave. > > -- > > Dave Chinner > > david@xxxxxxxxxxxxx