On Mon, Jun 20, 2022 at 12:33:11PM +0000, Abhradeep Chakraborty via GitGitGadget wrote: > From: Abhradeep Chakraborty <chakrabortyabhradeep79@xxxxxxxxx> > > Teach git to write bitmap lookup table extension. The table has the > following information: > > - `N` no of Object ids of each bitmapped commits s/no/number, s/Object/object, s/ids/IDs, and s/commits/commit > - A list of offset, xor-offset pair; the i'th pair denotes the > offsets and xor-offsets of i'th commit in the previous list. s/pair/pairs > - 4-byte integer denoting the flags > > Co-authored-by: Taylor Blau <ttaylorr@xxxxxxxxxx> > Mentored-by: Taylor Blau <ttaylorr@xxxxxxxxxx> > Co-mentored-by: Kaartic Sivaraam <kaartic.sivaraam@xxxxxxxxx> > Signed-off-by: Abhradeep Chakraborty <chakrabortyabhradeep79@xxxxxxxxx> > --- > pack-bitmap-write.c | 59 +++++++++++++++++++++++++++++++++++++++++++-- > 1 file changed, 57 insertions(+), 2 deletions(-) > > diff --git a/pack-bitmap-write.c b/pack-bitmap-write.c > index c43375bd344..9e88a64dd65 100644 > --- a/pack-bitmap-write.c > +++ b/pack-bitmap-write.c > @@ -650,7 +650,8 @@ static const struct object_id *oid_access(size_t pos, const void *table) > > static void write_selected_commits_v1(struct hashfile *f, > struct pack_idx_entry **index, > - uint32_t index_nr) > + uint32_t index_nr, > + off_t *offsets) > { > int i; > > @@ -663,6 +664,9 @@ static void write_selected_commits_v1(struct hashfile *f, > if (commit_pos < 0) > BUG("trying to write commit not in index"); > > + if (offsets) > + offsets[i] = hashfile_total(f); > + Makes sense; we record the offset for the ith commit as however many bytes we've already written into the hashfile up to this point, since the subsequent byte will begin the bitmap (well, the preceding few bytes of it, anyways) itself. > hashwrite_be32(f, commit_pos); > hashwrite_u8(f, stored->xor_offset); > hashwrite_u8(f, stored->flags); > @@ -671,6 +675,49 @@ static void write_selected_commits_v1(struct hashfile *f, > } > } > > +static int table_cmp(const void *_va, const void *_vb) > +{ > + return oidcmp(&writer.selected[*(uint32_t*)_va].commit->object.oid, > + &writer.selected[*(uint32_t*)_vb].commit->object.oid); This implementation looks right to me, but perhaps we should expand it out from the one-liner here to make it more readable. Perhaps something like: static int table_cmp(const void *_va, const void *_vb) { struct commit *c1 = &writer.selected[*(uint32_t*)_va]; struct commit *c2 = &writer.selected[*(uint32_t*)_vb]; return oidcmp(&c1->object.oid, &c2->object.oid); } which is arguably slightly more readable than the one-liner (but I don't feel that strongly about it.) > +static void write_lookup_table(struct hashfile *f, > + off_t *offsets) > +{ > + uint32_t i; > + uint32_t flags = 0; > + uint32_t *table, *table_inv; > + > + ALLOC_ARRAY(table, writer.selected_nr); > + ALLOC_ARRAY(table_inv, writer.selected_nr); > + > + for (i = 0; i < writer.selected_nr; i++) > + table[i] = i; > + QSORT(table, writer.selected_nr, table_cmp); > + for (i = 0; i < writer.selected_nr; i++) > + table_inv[table[i]] = i; Right... so table[0] will give us the index into writer.selected of the commit with the earliest OID in lexicographic order. And table_inv goes the other way around: table_inv[i] will tell us the lexicographic position of the commit at writer.selected[i]. > + for (i = 0; i < writer.selected_nr; i++) { > + struct bitmapped_commit *selected = &writer.selected[table[i]]; > + struct object_id *oid = &selected->commit->object.oid; > + > + hashwrite(f, oid->hash, the_hash_algo->rawsz); > + } > + for (i = 0; i < writer.selected_nr; i++) { > + struct bitmapped_commit *selected = &writer.selected[table[i]]; > + > + hashwrite_be32(f, offsets[table[i]]); > + hashwrite_be32(f, selected->xor_offset > + ? table_inv[table[i] - selected->xor_offset] ...which we need to discover the position of the XOR'd bitmap. Though I'm not sure if I remember why `table[i] - selected->xor_offset` is right and not `i - selected->xor_offset`. Thanks, Taylor