Here is a reroll my topic to introduce pseudo-merge bitmaps. Much is unchanged since last time, but notable changes (in response to Peff's review of the first five or so patches of this topic) include: - Rebased onto 2.45, so this is now based on 'master', which is at 786a3e4b8d (Git 2.45, 2024-04-29) at the time of writing. - Dropped patch 2/24 from the first round as it is no longer necessary. - Introduced some documentation and fixed a couple of comments around ewah_bitmap_is_subset() and bitmap_is_subset() to clarify which argument is supposed to be a subset of the other. Otherwise, a range-diff is included below for convenience. Thanks in advance for your review! Taylor Blau (23): Documentation/technical: describe pseudo-merge bitmaps format ewah: implement `ewah_bitmap_is_subset()` pack-bitmap: drop unused `max_bitmaps` parameter pack-bitmap: move some initialization to `bitmap_writer_init()` pseudo-merge.ch: initial commit pack-bitmap-write: support storing pseudo-merge commits pack-bitmap: implement `bitmap_writer_has_bitmapped_object_id()` pack-bitmap: make `bitmap_writer_push_bitmapped_commit()` public pseudo-merge: implement support for selecting pseudo-merge commits pack-bitmap-write.c: select pseudo-merge commits pack-bitmap-write.c: write pseudo-merge table pack-bitmap: extract `read_bitmap()` function pseudo-merge: scaffolding for reads pack-bitmap.c: read pseudo-merge extension pseudo-merge: implement support for reading pseudo-merge commits ewah: implement `ewah_bitmap_popcount()` pack-bitmap: implement test helpers for pseudo-merge t/test-lib-functions.sh: support `--date` in `test_commit_bulk()` pack-bitmap.c: use pseudo-merges during traversal pack-bitmap: extra trace2 information ewah: `bitmap_equals_ewah()` pseudo-merge: implement support for finding existing merges t/perf: implement performace tests for pseudo-merge bitmaps Documentation/config.txt | 2 + Documentation/config/bitmap-pseudo-merge.txt | 75 ++ Documentation/technical/bitmap-format.txt | 205 +++++ Makefile | 1 + builtin/pack-objects.c | 3 +- ewah/bitmap.c | 76 ++ ewah/ewok.h | 8 + midx-write.c | 3 +- pack-bitmap-write.c | 275 ++++++- pack-bitmap.c | 359 ++++++++- pack-bitmap.h | 16 +- pseudo-merge.c | 739 +++++++++++++++++++ pseudo-merge.h | 218 ++++++ t/helper/test-bitmap.c | 34 +- t/perf/p5333-pseudo-merge-bitmaps.sh | 32 + t/t5333-pseudo-merge-bitmaps.sh | 389 ++++++++++ t/test-lib-functions.sh | 12 +- 17 files changed, 2386 insertions(+), 61 deletions(-) create mode 100644 Documentation/config/bitmap-pseudo-merge.txt create mode 100644 pseudo-merge.c create mode 100644 pseudo-merge.h create mode 100755 t/perf/p5333-pseudo-merge-bitmaps.sh create mode 100755 t/t5333-pseudo-merge-bitmaps.sh Range-diff against v1: 1: 76e7e3b9cca = 1: 43fd5e35971 Documentation/technical: describe pseudo-merge bitmaps format 2: 21d8f9dc2b4 < -: ----------- config: repo_config_get_expiry() 3: 1347571ef4c ! 2: 290d928325d ewah: implement `ewah_bitmap_is_subset()` @@ ewah/bitmap.c: void bitmap_or(struct bitmap *self, const struct bitmap *other) + * Otherwise, compare the next two pairs of + * words. If the word from `self` has bit(s) not + * in the word from `other`, `self` is not a -+ * proper subset of `other`. ++ * subset of `other`. + */ + return 0; + } @@ ewah/bitmap.c: void bitmap_or(struct bitmap *self, const struct bitmap *other) + * If we got to this point, there may be zero or more words + * remaining in `self`, with no remaining words left in `other`. + * If there are any bits set in the remaining word(s) in `self`, -+ * then `self` is not a proper subset of `other`. ++ * then `self` is not a subset of `other`. + */ + while (ewah_iterator_next(&word, &it)) + if (word) @@ ewah/bitmap.c: void bitmap_or(struct bitmap *self, const struct bitmap *other) size_t original_size = self->word_alloc; ## ewah/ewok.h ## -@@ ewah/ewok.h: int bitmap_get(struct bitmap *self, size_t pos); +@@ ewah/ewok.h: void bitmap_unset(struct bitmap *self, size_t pos); + int bitmap_get(struct bitmap *self, size_t pos); void bitmap_free(struct bitmap *self); int bitmap_equals(struct bitmap *self, struct bitmap *other); ++ ++/* ++ * Both `bitmap_is_subset()` and `ewah_bitmap_is_subset()` return 1 if the set ++ * of bits in 'self' are a subset of the bits in 'other'. Returns 0 otherwise. ++ */ int bitmap_is_subset(struct bitmap *self, struct bitmap *other); +int ewah_bitmap_is_subset(struct ewah_bitmap *self, struct bitmap *other); 4: c6a08dae037 ! 3: 5160859f7f3 pack-bitmap: drop unused `max_bitmaps` parameter @@ builtin/pack-objects.c: static void write_pack_file(void) die(_("failed to write bitmap index")); bitmap_writer_finish(written_list, nr_written, - ## midx.c ## -@@ midx.c: static int write_midx_bitmap(const char *midx_name, + ## midx-write.c ## +@@ midx-write.c: static int write_midx_bitmap(const char *midx_name, for (i = 0; i < pdata->nr_objects; i++) index[pack_order[i]] = &pdata->objects[i].idx; 5: a6531656739 ! 4: 3d7d930b1c5 pack-bitmap: move some initialization to `bitmap_writer_init()` @@ builtin/pack-objects.c: static void write_pack_file(void) bitmap_writer_build_type_index( &to_pack, written_list, nr_written); - ## midx.c ## -@@ midx.c: static int write_midx_bitmap(const char *midx_name, + ## midx-write.c ## +@@ midx-write.c: static int write_midx_bitmap(const char *midx_name, for (i = 0; i < pdata->nr_objects; i++) index[i] = &pdata->objects[i].idx; 6: c6f9170af0f = 5: e7a87cf7d4e pseudo-merge.ch: initial commit 7: 7acdee2b5f2 = 6: ee33a703245 pack-bitmap-write: support storing pseudo-merge commits 8: 4fdd7dda274 = 7: 9c6d09bf874 pack-bitmap: implement `bitmap_writer_has_bitmapped_object_id()` 9: d74cf3e484d = 8: dfd4b73d12e pack-bitmap: make `bitmap_writer_push_bitmapped_commit()` public 10: 323e1250b24 = 9: 86a1e4b8b9b pseudo-merge: implement support for selecting pseudo-merge commits 11: bf6b0d8601e = 10: 12b432e3a8a pack-bitmap-write.c: select pseudo-merge commits 12: 4c594f3faa8 = 11: 6ce805d061e pack-bitmap-write.c: write pseudo-merge table 13: 7a31a932ab3 = 12: 60f6b310213 pack-bitmap: extract `read_bitmap()` function 14: 7e4d051f37a = 13: 9465313691b pseudo-merge: scaffolding for reads 15: 7bb644b2b0c = 14: 5894f3d5369 pack-bitmap.c: read pseudo-merge extension 16: 792cc863154 = 15: 7dbee8bcbdf pseudo-merge: implement support for reading pseudo-merge commits 17: 8fb7f7ab37b = 16: 09650aa53e9 ewah: implement `ewah_bitmap_popcount()` 18: c839e1fed15 = 17: 7b5ea56d053 pack-bitmap: implement test helpers for pseudo-merge 19: 7d3b88e6fd6 = 18: 006abdd1698 t/test-lib-functions.sh: support `--date` in `test_commit_bulk()` 20: c18694ade2a = 19: 3f85e5b90f5 pack-bitmap.c: use pseudo-merges during traversal 21: d38ebeba419 = 20: 5fac186df64 pack-bitmap: extra trace2 information 22: 1eb10c190ba ! 21: b5aea8e57f8 ewah: `bitmap_equals_ewah()` @@ ewah/ewok.h: void bitmap_unset(struct bitmap *self, size_t pos); void bitmap_free(struct bitmap *self); int bitmap_equals(struct bitmap *self, struct bitmap *other); +int bitmap_equals_ewah(struct bitmap *self, struct ewah_bitmap *other); - int bitmap_is_subset(struct bitmap *self, struct bitmap *other); - int ewah_bitmap_is_subset(struct ewah_bitmap *self, struct bitmap *other); + /* + * Both `bitmap_is_subset()` and `ewah_bitmap_is_subset()` return 1 if the set 23: 4ae4f0eaae5 = 22: 61ddb574285 pseudo-merge: implement support for finding existing merges 24: a05ad42202d = 23: 2bd830d35dd t/perf: implement performace tests for pseudo-merge bitmaps base-commit: 786a3e4b8d754d2b14b1208b98eeb0a554ef19a8 -- 2.45.0.23.gc6f94b99219