On Wed, Jul 20, 2022 at 06:38:20PM +0000, Abhradeep Chakraborty via GitGitGadget wrote: > From: Abhradeep Chakraborty <chakrabortyabhradeep79@xxxxxxxxx> > > The bitmap lookup table extension was documented by an earlier > change, but Git does not yet know how to write that extension. > > Teach Git to write bitmap lookup table extension. The table contains > the list of `N` <commit_pos, offset, xor_row>` triplets. These > triplets are sorted according to their commit pos (ascending order). > The meaning of each data in the i'th triplet is given below: > > - commit_pos stores commit position (in the pack-index or midx). > It is a 4 byte network byte order unsigned integer. > > - offset is the position (in the bitmap file) from which that > commit's bitmap can be read. > > - xor_row is the position of the triplet in the lookup table > whose bitmap is used to compress this bitmap, or `0xffffffff` > if no such bitmap exists. > > Mentored-by: Taylor Blau <me@xxxxxxxxxxxx> > Co-mentored-by: Kaartic Sivaraam <kaartic.sivaraam@xxxxxxxxx> > Co-authored-by: Taylor Blau <me@xxxxxxxxxxxx> > Signed-off-by: Abhradeep Chakraborty <chakrabortyabhradeep79@xxxxxxxxx> > --- > pack-bitmap-write.c | 112 ++++++++++++++++++++++++++++++++++++++++---- > pack-bitmap.h | 5 +- > 2 files changed, 107 insertions(+), 10 deletions(-) > > diff --git a/pack-bitmap-write.c b/pack-bitmap-write.c > index c43375bd344..9843790cb60 100644 > --- a/pack-bitmap-write.c > +++ b/pack-bitmap-write.c > @@ -650,20 +650,19 @@ static const struct object_id *oid_access(size_t pos, const void *table) > > static void write_selected_commits_v1(struct hashfile *f, > struct pack_idx_entry **index, > - uint32_t index_nr) > + uint32_t index_nr, > + off_t *offsets, > + uint32_t *commit_positions) > { > int i; > > for (i = 0; i < writer.selected_nr; ++i) { > struct bitmapped_commit *stored = &writer.selected[i]; > > - int commit_pos = > - oid_pos(&stored->commit->object.oid, index, index_nr, oid_access); > + if (offsets) > + offsets[i] = hashfile_total(f); > > - if (commit_pos < 0) > - BUG("trying to write commit not in index"); > - > - hashwrite_be32(f, commit_pos); > + hashwrite_be32(f, commit_positions[i]); I wonder if it would make this patch a little more readable to construct and use the commit_positions array as a single preparatory step before this commit. What do you think? > +static void write_lookup_table(struct hashfile *f, > + struct pack_idx_entry **index, > + uint32_t index_nr, > + off_t *offsets, > + uint32_t *commit_positions) > +{ > + uint32_t i; > + uint32_t *table, *table_inv; > + > + ALLOC_ARRAY(table, writer.selected_nr); > + ALLOC_ARRAY(table_inv, writer.selected_nr); > + > + for (i = 0; i < writer.selected_nr; i++) > + table[i] = i; > + > + /* > + * At the end of this sort table[j] = i means that the i'th > + * bitmap corresponds to j'th bitmapped commit (among the selected > + * commits) in lex order of OIDs. > + */ > + QSORT_S(table, writer.selected_nr, table_cmp, commit_positions); > + > + /* table_inv helps us discover that relationship (i'th bitmap > + * to j'th commit by j = table_inv[i]) > + */ > + for (i = 0; i < writer.selected_nr; i++) > + table_inv[table[i]] = i; > + > + trace2_region_enter("pack-bitmap-write", "writing_lookup_table", the_repository); > + for (i = 0; i < writer.selected_nr; i++) { > + struct bitmapped_commit *selected = &writer.selected[table[i]]; > + uint32_t xor_offset = selected->xor_offset; > + uint32_t xor_row; > + > + if (xor_offset) { > + /* > + * xor_index stores the index (in the bitmap entries) > + * of the corresponding xor bitmap. But we need to convert > + * this index into lookup table's index. So, table_inv[xor_index] > + * gives us the index position w.r.t. the lookup table. > + * > + * If "k = table[i] - xor_offset" then the xor base is the k'th > + * bitmap. `table_inv[k]` gives us the position of that bitmap > + * in the lookup table. > + */ > + uint32_t xor_index = table[i] - xor_offset; > + xor_row = table_inv[xor_index]; > + } else { > + xor_row = 0xffffffff; > + } > + > + hashwrite_be32(f, commit_positions[table[i]]); > + hashwrite_be64(f, (uint64_t)offsets[table[i]]); > + hashwrite_be32(f, xor_row); > + } > + trace2_region_leave("pack-bitmap-write", "writing_lookup_table", the_repository); > + > + free(table); > + free(table_inv); > +} > @@ -715,7 +791,25 @@ void bitmap_writer_finish(struct pack_idx_entry **index, > dump_bitmap(f, writer.trees); > dump_bitmap(f, writer.blobs); > dump_bitmap(f, writer.tags); > - write_selected_commits_v1(f, index, index_nr); > + > + ALLOC_ARRAY(commit_positions, writer.selected_nr); > + for (uint32_t i = 0; i < writer.selected_nr; ++i) { Nit; we don't typically write for-loop expressions with variable declarations inside of them. Make sure to declare i outside of the loop, and then this becomes: for (i = 0; i < writer.selected_nr; i++) (also, we typically use the postfix ++ operator, that is "i++" instead of "++i" unless there is a reason to prefer the latter over the former). Thanks, Taylor