Here is another reroll my topic to introduce pseudo-merge bitmaps. The implementation is still relatively unchanged compared to last time, save for the review that Peff provided on the remaining parts of this series. As usual, there is a range-diff below, but the significant changes since last time are as follows: - replace git_config_float() with git_config_double() (and matching tweaks in the callers) - add a NOTE to gitpacking(7) reflecting that the pseudo-merge bitmaps feature is considered experimental - add a length check when reading extended pseudo merges via the pseudo_merge_ext_at() function - replace the new `--date` option for `test_commit_bulk` with a `--notick` option (and set the GIT_COMMITTER_DATE values appropriately at the callers) - fix broken performance tests due to a typo on "pseudo", and include results from a large repository. The series is still based on 'tb/pack-bitmap-write-cleanups'. Taylor Blau (24): Documentation/gitpacking.txt: initial commit Documentation/gitpacking.txt: describe pseudo-merge bitmaps Documentation/technical: describe pseudo-merge bitmaps format ewah: implement `ewah_bitmap_is_subset()` pack-bitmap: move some initialization to `bitmap_writer_init()` pseudo-merge.ch: initial commit pack-bitmap-write: support storing pseudo-merge commits pack-bitmap: implement `bitmap_writer_has_bitmapped_object_id()` pack-bitmap: make `bitmap_writer_push_bitmapped_commit()` public config: introduce `git_config_double()` pseudo-merge: implement support for selecting pseudo-merge commits pack-bitmap-write.c: write pseudo-merge table pack-bitmap: extract `read_bitmap()` function pseudo-merge: scaffolding for reads pack-bitmap.c: read pseudo-merge extension pseudo-merge: implement support for reading pseudo-merge commits ewah: implement `ewah_bitmap_popcount()` pack-bitmap: implement test helpers for pseudo-merge t/test-lib-functions.sh: support `--notick` in `test_commit_bulk()` pack-bitmap.c: use pseudo-merges during traversal pack-bitmap: extra trace2 information ewah: `bitmap_equals_ewah()` pseudo-merge: implement support for finding existing merges t/perf: implement performance tests for pseudo-merge bitmaps Documentation/Makefile | 1 + Documentation/config.txt | 2 + Documentation/config/bitmap-pseudo-merge.txt | 91 +++ Documentation/gitpacking.txt | 189 +++++ Documentation/technical/bitmap-format.txt | 132 ++++ Makefile | 1 + builtin/pack-objects.c | 3 +- config.c | 9 + config.h | 7 + ewah/bitmap.c | 76 ++ ewah/ewok.h | 8 + midx-write.c | 2 +- object.h | 2 +- pack-bitmap-write.c | 274 ++++++- pack-bitmap.c | 359 ++++++++- pack-bitmap.h | 19 +- parse.c | 29 + parse.h | 1 + pseudo-merge.c | 756 +++++++++++++++++++ pseudo-merge.h | 216 ++++++ t/helper/test-bitmap.c | 34 +- t/perf/p5333-pseudo-merge-bitmaps.sh | 32 + t/t5333-pseudo-merge-bitmaps.sh | 393 ++++++++++ t/test-lib-functions.sh | 9 +- 24 files changed, 2590 insertions(+), 55 deletions(-) create mode 100644 Documentation/config/bitmap-pseudo-merge.txt create mode 100644 Documentation/gitpacking.txt create mode 100644 pseudo-merge.c create mode 100644 pseudo-merge.h create mode 100755 t/perf/p5333-pseudo-merge-bitmaps.sh create mode 100755 t/t5333-pseudo-merge-bitmaps.sh Range-diff against v3: -: ----------- > 1: 0f20c9becf4 Documentation/gitpacking.txt: initial commit 1: 528b591bd84 ! 2: 48afaa74928 Documentation/gitpacking.txt: describe pseudo-merge bitmaps @@ Documentation/gitpacking.txt: There are many aspects of packing in Git that are +== Pseudo-merge bitmaps + ++NOTE: Pseudo-merge bitmaps are considered an experimental feature, so ++the configuration and many of the ideas are subject to change. ++ +=== Background + +Reachability bitmaps are most efficient when we have on-disk stored 2: 12f318b3d7e = 3: 44046f83c1a Documentation/technical: describe pseudo-merge bitmaps format 3: 40eb6137618 = 4: 211d6f14128 ewah: implement `ewah_bitmap_is_subset()` 4: 487fb7c6e9c = 5: 650cac2dcf9 pack-bitmap: move some initialization to `bitmap_writer_init()` 5: 827732acf99 = 6: 6647d8832ce pseudo-merge.ch: initial commit 6: 8608dd1860f = 7: e8ef1ef5ee4 pack-bitmap-write: support storing pseudo-merge commits 7: 99d2b6872ba = 8: fe458728c8a pack-bitmap: implement `bitmap_writer_has_bitmapped_object_id()` 8: e7209c60fa5 = 9: 6bf372f4020 pack-bitmap: make `bitmap_writer_push_bitmapped_commit()` public 9: 3070135eb4b ! 10: 6c77671ae9c config: introduce git_config_float() @@ Metadata Author: Taylor Blau <me@xxxxxxxxxxxx> ## Commit message ## - config: introduce git_config_float() + config: introduce `git_config_double()` - Future commits will want to parse a floating point value from - configuration, but we have no way to parse such a value prior to this - patch. + Future commits will want to parse a double-precision floating point + value from configuration, but we have no way to parse such a value prior + to this patch. - The core of the routine is implemented in git_parse_float(). Unlike + The core of the routine is implemented in git_parse_double(). Unlike git_parse_unsigned() and git_parse_signed(), however, the function - implemented here only works on type "float", and not related types like - "double", or "long double". + implemented here only works on type "double", and not related types like + "float", or "long double". - This is because "double" and "long double" use different functions to - convert from ASCII strings to floating point values (strtod() and + This is because "float" and "long double" use different functions to + convert from ASCII strings to floating point values (strtof() and strtold(), respectively). Likewise, there is no pointer type that can assign to any of these values (except for "void *"), so the only way to define this trio of functions would be with a macro expansion that is parameterized over the floating point type and conversion function. That is all doable, but likely to be overkill given our current needs, - which is only to parse floats. + which is only to parse double-precision floats. Signed-off-by: Taylor Blau <me@xxxxxxxxxxxx> @@ config.c: ssize_t git_config_ssize_t(const char *name, const char *value, return ret; } -+float git_config_float(const char *name, const char *value, -+ const struct key_value_info *kvi) ++double git_config_double(const char *name, const char *value, ++ const struct key_value_info *kvi) +{ -+ float ret; -+ if (!git_parse_float(value, &ret)) ++ double ret; ++ if (!git_parse_double(value, &ret)) + die_bad_number(name, value, kvi); + return ret; +} @@ config.h: unsigned long git_config_ulong(const char *, const char *, const struct key_value_info *); +/** -+ * Identical to `git_config_int`, but for floating point values. ++ * Identically to `git_config_double`, but for double-precision floating point ++ * values. + */ -+float git_config_float(const char *, const char *, -+ const struct key_value_info *); ++double git_config_double(const char *, const char *, ++ const struct key_value_info *); + /** * Same as `git_config_bool`, except that integers are returned as-is, and @@ parse.c: int git_parse_ssize_t(const char *value, ssize_t *ret) return 1; } -+int git_parse_float(const char *value, float *ret) ++int git_parse_double(const char *value, double *ret) +{ + char *end; -+ float val; ++ double val; + uintmax_t factor; + + if (!value || !*value) { @@ parse.c: int git_parse_ssize_t(const char *value, ssize_t *ret) + } + + errno = 0; -+ val = strtof(value, &end); ++ val = strtod(value, &end); + if (errno == ERANGE) + return 0; + if (end == value) { @@ parse.h: int git_parse_ssize_t(const char *, ssize_t *); int git_parse_ulong(const char *, unsigned long *); int git_parse_int(const char *value, int *ret); int git_parse_int64(const char *value, int64_t *ret); -+int git_parse_float(const char *value, float *ret); ++int git_parse_double(const char *value, double *ret); /** * Same as `git_config_bool`, except that it returns -1 on error rather 10: 3029473c094 ! 11: 180072ce848 pseudo-merge: implement support for selecting pseudo-merge commits @@ Documentation/config/bitmap-pseudo-merge.txt (new) @@ +NOTE: The configuration options in `bitmapPseudoMerge.*` are considered +EXPERIMENTAL and may be subject to change or be removed entirely in the -+future. ++future. For more information about the pseudo-merge bitmap feature, see ++the "Pseudo-merge bitmaps" section of linkgit:gitpacking[7]. + +bitmapPseudoMerge.<name>.pattern:: + Regular expression used to match reference names. Commits @@ pseudo-merge.c +#include "alloc.h" +#include "progress.h" + -+#define DEFAULT_PSEUDO_MERGE_DECAY 1.0f ++#define DEFAULT_PSEUDO_MERGE_DECAY 1.0 +#define DEFAULT_PSEUDO_MERGE_MAX_MERGES 64 +#define DEFAULT_PSEUDO_MERGE_SAMPLE_RATE 1 +#define DEFAULT_PSEUDO_MERGE_THRESHOLD approxidate("1.week.ago") +#define DEFAULT_PSEUDO_MERGE_STABLE_THRESHOLD approxidate("1.month.ago") +#define DEFAULT_PSEUDO_MERGE_STABLE_SIZE 512 + -+static float gitexp(float base, int exp) ++static double gitexp(double base, int exp) +{ -+ float result = 1; ++ double result = 1; + while (1) { + if (exp % 2) + result *= base; @@ pseudo-merge.c + const struct pseudo_merge_matches *matches, + uint32_t i) +{ -+ float C = 0.0f; ++ double C = 0.0f; + uint32_t n; + + /* @@ pseudo-merge.c + * { 5012, 1772, 964, 626, 448, 341, 271, 221, 186, 158 } + */ + for (n = 0; n < group->max_merges; n++) -+ C += 1.0f / gitexp(n + 1, group->decay); ++ C += 1.0 / gitexp(n + 1, group->decay); + C = matches->unstable_nr / C; + + return (uint32_t)((C / gitexp(i + 1, group->decay)) + 0.5); @@ pseudo-merge.c + + strbuf_release(&re); + } else if (!strcmp(key, "decay")) { -+ group->decay = git_config_float(var, value, ctx->kvi); ++ group->decay = git_config_double(var, value, ctx->kvi); + if (group->decay < 0) { + warning(_("%s must be non-negative, using default"), var); + group->decay = DEFAULT_PSEUDO_MERGE_DECAY; + } + } else if (!strcmp(key, "samplerate")) { -+ group->sample_rate = git_config_float(var, value, ctx->kvi); ++ group->sample_rate = git_config_double(var, value, ctx->kvi); + if (!(0 <= group->sample_rate && group->sample_rate <= 1)) { + warning(_("%s must be between 0 and 1, using default"), var); + group->sample_rate = DEFAULT_PSEUDO_MERGE_SAMPLE_RATE; @@ pseudo-merge.c + struct commit *c = matches->unstable[j]; + struct pseudo_merge_commit_idx *pmc; + -+ if (j % (uint32_t)(1.0f / group->sample_rate)) ++ if (j % (uint32_t)(1.0 / group->sample_rate)) + continue; + + pmc = pseudo_merge_idx(writer->pseudo_merge_commits, @@ pseudo-merge.h + * Pseudo-merge grouping parameters. See git-config(1) for + * more information. + */ -+ float decay; ++ double decay; + int max_merges; -+ float sample_rate; ++ double sample_rate; + int stable_size; + timestamp_t threshold; + timestamp_t stable_threshold; 11: 311226f65c2 = 12: 90df19e43f5 pack-bitmap-write.c: write pseudo-merge table 12: 55dd7a8023e = 13: c653a10f8e4 pack-bitmap: extract `read_bitmap()` function 13: 3cc5434e44e = 14: 435ac048003 pseudo-merge: scaffolding for reads 14: 7664f5f9648 = 15: fa7a948964c pack-bitmap.c: read pseudo-merge extension 15: 8ba0a9c5402 ! 16: 3a72e66cb69 pseudo-merge: implement support for reading pseudo-merge commits @@ pseudo-merge.c #include "progress.h" +#include "hex.h" - #define DEFAULT_PSEUDO_MERGE_DECAY 1.0f + #define DEFAULT_PSEUDO_MERGE_DECAY 1.0 #define DEFAULT_PSEUDO_MERGE_MAX_MERGES 64 @@ pseudo-merge.c: void free_pseudo_merge_map(struct pseudo_merge_map *pm) } @@ pseudo-merge.c: void free_pseudo_merge_map(struct pseudo_merge_map *pm) + return error(_("extended pseudo-merge read out-of-bounds " + "(%"PRIuMAX" >= %"PRIuMAX")"), + (uintmax_t)at, (uintmax_t)pm->map_size); ++ if (at + 4 >= pm->map_size) ++ return error(_("extended pseudo-merge entry is too short " ++ "(%"PRIuMAX" >= %"PRIuMAX")"), ++ (uintmax_t)(at + 4), (uintmax_t)pm->map_size); + + ext->nr = get_be32(pm->map + at); + ext->ptr = pm->map + at + sizeof(uint32_t); 16: 2c02f303b6f = 17: 42a836fda8a ewah: implement `ewah_bitmap_popcount()` 17: 82cce72bf55 = 18: 06ba1a5bbfd pack-bitmap: implement test helpers for pseudo-merge 18: 890f6c4b9de ! 19: 936f6d1b7e3 t/test-lib-functions.sh: support `--date` in `test_commit_bulk()` @@ Metadata Author: Taylor Blau <me@xxxxxxxxxxxx> ## Commit message ## - t/test-lib-functions.sh: support `--date` in `test_commit_bulk()` + t/test-lib-functions.sh: support `--notick` in `test_commit_bulk()` One of the tests we'll want to add for pseudo-merge bitmaps needs to be able to generate a large number of commits at a specific date. - Support the `--date` option (with identical semantics to the `--date` - option for `test_commit()`) within `test_commit_bulk` as a prerequisite - for that. + Support the `--notick` option (with identical semantics to the + `--notick` option for `test_commit()`) within `test_commit_bulk` as a + prerequisite for that. Callers can then set the various _DATE variables + themselves. Signed-off-by: Taylor Blau <me@xxxxxxxxxxxx> @@ t/test-lib-functions.sh: test_commit_bulk () { filename="${1#--*=}-%s.t" contents="${1#--*=} %s" ;; -+ --date) ++ --notick) + notick=yes -+ GIT_COMMITTER_DATE="$2" -+ GIT_AUTHOR_DATE="$2" -+ shift + ;; -*) BUG "invalid test_commit_bulk option: $1" 19: 41691824f78 ! 20: cad38608aae pack-bitmap.c: use pseudo-merges during traversal @@ t/t5333-pseudo-merge-bitmaps.sh (new) + new="1672549200" && # 2023-01-01 + old="1641013200" && # 2022-01-01 + -+ test_commit_bulk --message="new" --date "$new +0000" 128 && -+ test_commit_bulk --message="old" --date "$old +0000" 128 && -+ test_tick && ++ GIT_COMMITTER_DATE="$new +0000" && ++ export GIT_COMMITTER_DATE && ++ test_commit_bulk --message="new" --notick 128 && ++ ++ GIT_COMMITTER_DATE="$old +0000" && ++ export GIT_COMMITTER_DATE && ++ test_commit_bulk --message="old" --notick 128 && + + tag_everything && + @@ t/t5333-pseudo-merge-bitmaps.sh (new) + mid="1654059600" && # 2022-06-01 + old="1641013200" && # 2022-01-01 + -+ test_commit_bulk --message="mid" --date "$mid +0000" 128 && -+ test_tick && ++ GIT_COMMITTER_DATE="$mid +0000" && ++ export GIT_COMMITTER_DATE && ++ test_commit_bulk --message="mid" --notick 128 && + + git for-each-ref --format="delete %(refname)" refs/tags >in && + git update-ref --stdin <in && 20: a34a60c3ef8 = 21: 9240b06a7d8 pack-bitmap: extra trace2 information 21: da2fb5b4b48 = 22: 625596a1432 ewah: `bitmap_equals_ewah()` 22: ff21247281f ! 23: fdd506d4544 pseudo-merge: implement support for finding existing merges @@ t/t5333-pseudo-merge-bitmaps.sh: test_expect_success 'pseudo-merge overlap stale + stable="1641013200" && # 2022-01-01 + unstable="1672549200" && # 2023-01-01 + -+ for date in $stable $unstable -+ do -+ test_commit_bulk --date "$date +0000" 128 && -+ test_tick || return 1 -+ done && ++ GIT_COMMITTER_DATE="$stable +0000" && ++ export GIT_COMMITTER_DATE && ++ test_commit_bulk --notick 128 && ++ GIT_COMMITTER_DATE="$unstable +0000" && ++ export GIT_COMMITTER_DATE && ++ test_commit_bulk --notick 128 && + + tag_everything && + 23: 6a6d88fa512 ! 24: cf0316ad0e9 t/perf: implement performace tests for pseudo-merge bitmaps @@ Metadata Author: Taylor Blau <me@xxxxxxxxxxxx> ## Commit message ## - t/perf: implement performace tests for pseudo-merge bitmaps + t/perf: implement performance tests for pseudo-merge bitmaps Implement a straightforward performance test demonstrating the benefit of pseudo-merge bitmaps by measuring how long it takes to count @@ Commit message Test this tree ----------------------------------------------------------------------------------- - 5333.2: git rev-list --count --all --objects (no bitmaps) 3.46(3.37+0.09) - 5333.3: git rev-list --count --all --objects (no pseudo-merges) 0.13(0.11+0.01) + 5333.2: git rev-list --count --all --objects (no bitmaps) 3.54(3.45+0.08) + 5333.3: git rev-list --count --all --objects (no pseudo-merges) 0.43(0.40+0.03) 5333.4: git rev-list --count --all --objects (with pseudo-merges) 0.12(0.11+0.01) + On a private repository which is much larger, and has many spikey parts + of history that aren't merged into the 'master' branch, the results are + as follows: + + Test this tree + --------------------------------------------------------------------------------------- + 5333.1: git rev-list --count --all --objects (no bitmaps) 122.29(121.31+0.97) + 5333.2: git rev-list --count --all --objects (no pseudo-merges) 21.88(21.30+0.58) + 5333.3: git rev-list --count --all --objects (with pseudo-merges) 5.05(4.77+0.28) + Signed-off-by: Taylor Blau <me@xxxxxxxxxxxx> ## t/perf/p5333-pseudo-merge-bitmaps.sh (new) ## @@ t/perf/p5333-pseudo-merge-bitmaps.sh (new) +' + +test_perf 'git rev-list --count --all --objects (no pseudo-merges)' ' -+ GIT_TEST_USE_PSEDUO_MERGES=0 \ ++ GIT_TEST_USE_PSEUDO_MERGES=0 \ + git rev-list --objects --all --use-bitmap-index +' + +test_perf 'git rev-list --count --all --objects (with pseudo-merges)' ' -+ GIT_TEST_USE_PSEDUO_MERGES=1 \ ++ GIT_TEST_USE_PSEUDO_MERGES=1 \ + git rev-list --objects --all --use-bitmap-index +' + base-commit: bf65967764f34adc2ca00d4c8195840ad3e4e127 -- 2.45.1.175.gcf0316ad0e9