Taylor Blau <me@xxxxxxxxxxxx> writes: >> > In practice, these boundary commits likely occupy a small proportion of >> > the overall number of commits, and so the size penalty is likely smaller >> > than storing a bitmap for all commits. >> >> | Percentage of >> | commits modifying >> | 0 path | >= 512 paths >> ---------------+------------+---------------- >> android-base | 13.20% | 0.13% >> cmssw | 0.15% | 0.23% >> cpython | 3.07% | 0.01% >> elasticsearch | 0.70% | 1.00% >> gcc | 0.00% | 0.08% >> gecko-dev | 0.14% | 0.64% >> git | 0.11% | 0.02% >> glibc | 0.02% | 0.10% >> go | 0.00% | 0.07% >> homebrew-cask | 0.40% | 0.02% >> homebrew-core | 0.01% | 0.01% >> jdk | 0.26% | 5.64% >> linux | 0.01% | 0.51% >> llvm-project | 0.12% | 0.03% >> rails | 0.10% | 0.10% >> rust | 0.07% | 0.17% >> tensorflow | 0.09% | 1.02% >> webkit | 0.05% | 0.31% > > This is very useful information to have! Without the total number of > commits, it's impossible to know whether or not this is a win over the > BFXL chunk. But, since the number of commits is probably "large" versus > the percentage of boundary commits which is "small", it's almost > certainly an advantage. Do you want to include it in either the log message in one of the commits, in code comment, or a technical doc? > So, I think that if it's truly misleading, we could revisit this after > the topic is merged. But, I'm not planning on changing anything at this > point. If you do not want to help us go the last-mile to completion, that is sad, but I do not want to see a basically good patch stall like that, so let's find somebody else who can do the helping ;-) Here is what my trial rebase produced. I'll queue it to 'seen' (if you prefer I can send a full v3 patch, but I expect that you know how to fetch from 'seen' and review locally) after checking if the result passes the tests locally; an extra set of eyeballs to verify the result is pretty much appreciated. Thanks. 1: ca6c060171 ! 1: 9b294a0c66 bloom: encode out-of-bounds filters as non-empty @@ Commit message efficient than storing a bitmap for which commits haven't computed their Bloom filters. Storing a bitmap incurs a penalty of one bit per commit, whereas storing explicit filters as above incurs a penalty of one byte - per too-large or too-small commit. + per too-large or empty commit. In practice, these boundary commits likely occupy a small proportion of the overall number of commits, and so the size penalty is likely smaller @@ bloom.c: struct bloom_filter *get_or_compute_bloom_filter(struct repository *r, filter->len = (hashmap_get_size(&pathmap) * settings->bits_per_entry + BITS_PER_WORD - 1) / BITS_PER_WORD; + if (!filter->len) { + if (computed) -+ *computed |= BLOOM_TRUNC_SMALL; ++ *computed |= BLOOM_TRUNC_EMPTY; + filter->len = 1; + } filter->data = xcalloc(filter->len, sizeof(unsigned char)); @@ bloom.h: enum bloom_filter_computed { BLOOM_NOT_COMPUTED = (1 << 0), BLOOM_COMPUTED = (1 << 1), BLOOM_TRUNC_LARGE = (1 << 2), -+ BLOOM_TRUNC_SMALL = (1 << 3), ++ BLOOM_TRUNC_EMPTY = (1 << 3), }; struct bloom_filter *get_or_compute_bloom_filter(struct repository *r, @@ commit-graph.c: struct write_commit_graph_context { int count_bloom_filter_computed; int count_bloom_filter_not_computed; -+ int count_bloom_filter_trunc_small; ++ int count_bloom_filter_trunc_empty; int count_bloom_filter_trunc_large; }; @@ commit-graph.c: static void trace2_bloom_filter_write_statistics(struct write_co ctx->count_bloom_filter_computed); trace2_data_intmax("commit-graph", ctx->r, "filter-not-computed", ctx->count_bloom_filter_not_computed); -+ trace2_data_intmax("commit-graph", ctx->r, "filter-trunc-small", -+ ctx->count_bloom_filter_trunc_small); ++ trace2_data_intmax("commit-graph", ctx->r, "filter-trunc-empty", ++ ctx->count_bloom_filter_trunc_empty); trace2_data_intmax("commit-graph", ctx->r, "filter-trunc-large", ctx->count_bloom_filter_trunc_large); } @@ commit-graph.c: static void compute_bloom_filters(struct write_commit_graph_cont &computed); if (computed & BLOOM_COMPUTED) { ctx->count_bloom_filter_computed++; -+ if (computed & BLOOM_TRUNC_SMALL) -+ ctx->count_bloom_filter_trunc_small++; ++ if (computed & BLOOM_TRUNC_EMPTY) ++ ctx->count_bloom_filter_trunc_empty++; if (computed & BLOOM_TRUNC_LARGE) ctx->count_bloom_filter_trunc_large++; } else if (computed & BLOOM_NOT_COMPUTED) @@ t/t4216-log-bloom.sh: test_max_changed_paths () { grep "\"key\":\"filter-computed\",\"value\":\"$1\"" $2 } -+test_filter_trunc_small () { -+ grep "\"key\":\"filter-trunc-small\",\"value\":\"$1\"" $2 ++test_filter_trunc_empty () { ++ grep "\"key\":\"filter-trunc-empty\",\"value\":\"$1\"" $2 +} + test_filter_trunc_large () { @@ t/t4216-log-bloom.sh: test_expect_success 'correctly report changes over limit' ' +test_expect_success 'correctly report commits with no changed paths' ' -+ git init small && -+ test_when_finished "rm -fr small" && ++ git init empty && ++ test_when_finished "rm -fr empty" && + ( -+ cd small && ++ cd empty && + + git commit --allow-empty -m "initial commit" && + @@ t/t4216-log-bloom.sh: test_expect_success 'correctly report changes over limit' + git commit-graph write --reachable --changed-paths && + test_filter_computed 1 trace.event && + test_filter_not_computed 0 trace.event && -+ test_filter_trunc_small 1 trace.event && ++ test_filter_trunc_empty 1 trace.event && + test_filter_trunc_large 0 trace.event + ) +' 2: 11db600d51 = 2: 1b4c861e68 commit-graph: rename 'split_commit_graph_opts' 3: cf49598137 ! 3: d6c1bd395e builtin/commit-graph.c: introduce '--max-new-filters=<n>' @@ t/t4216-log-bloom.sh: test_expect_success 'correctly report commits with no chan + + test_filter_computed 2 trace.event && + test_filter_not_computed 3 trace.event && -+ test_filter_trunc_small 0 trace.event && ++ test_filter_trunc_empty 0 trace.event && + test_filter_trunc_large 0 trace.event + ) +' @@ t/t4216-log-bloom.sh: test_expect_success 'correctly report commits with no chan + --split=replace --max-new-filters=1 && + test_filter_computed 1 trace.event && + test_filter_not_computed 4 trace.event && -+ test_filter_trunc_small 0 trace.event && ++ test_filter_trunc_empty 0 trace.event && + test_filter_trunc_large 0 trace.event + ) +' @@ t/t4216-log-bloom.sh: test_expect_success 'correctly report commits with no chan + --changed-paths --max-new-filters=2 && + test_filter_computed 2 trace.event && + test_filter_not_computed 4 trace.event && -+ test_filter_trunc_small 2 trace.event && ++ test_filter_trunc_empty 2 trace.event && + test_filter_trunc_large 0 trace.event + done && + @@ t/t4216-log-bloom.sh: test_expect_success 'correctly report commits with no chan + --changed-paths --max-new-filters=2 && + test_filter_computed 0 trace.event && + test_filter_not_computed 6 trace.event && -+ test_filter_trunc_small 0 trace.event && ++ test_filter_trunc_empty 0 trace.event && + test_filter_trunc_large 0 trace.event + ) +' 4: 1bc82cd008 ! 4: b0d51fb04a commit-graph: introduce 'commitGraph.maxNewFilters' @@ t/t4216-log-bloom.sh: test_expect_success 'Bloom generation is limited by --max- + --split=replace && test_filter_computed 1 trace.event && test_filter_not_computed 4 trace.event && - test_filter_trunc_small 0 trace.event && + test_filter_trunc_empty 0 trace.event && @@ t/t4216-log-bloom.sh: test_expect_success 'Bloom generation backfills previously-skipped filters' ' ) ' @@ t/t4216-log-bloom.sh: test_expect_success 'Bloom generation backfills previously + --max-new-filters=1 && + test_filter_computed 1 trace.event && + test_filter_not_computed 1 trace.event && -+ test_filter_trunc_small 0 trace.event && ++ test_filter_trunc_empty 0 trace.event && + test_filter_trunc_large 0 trace.event + ) +'