Re: [PATCH v2 10/13] bloom: encode out-of-bounds filters as non-empty

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Taylor Blau <me@xxxxxxxxxxxx> writes:

>> > In practice, these boundary commits likely occupy a small proportion of
>> > the overall number of commits, and so the size penalty is likely smaller
>> > than storing a bitmap for all commits.
>>
>>                  |      Percentage of
>>                  |    commits modifying
>>                  |   0 path   |  >= 512 paths
>>   ---------------+------------+----------------
>>   android-base   |   13.20%   |   0.13%
>>   cmssw          |    0.15%   |   0.23%
>>   cpython        |    3.07%   |   0.01%
>>   elasticsearch  |    0.70%   |   1.00%
>>   gcc            |    0.00%   |   0.08%
>>   gecko-dev      |    0.14%   |   0.64%
>>   git            |    0.11%   |   0.02%
>>   glibc          |    0.02%   |   0.10%
>>   go             |    0.00%   |   0.07%
>>   homebrew-cask  |    0.40%   |   0.02%
>>   homebrew-core  |    0.01%   |   0.01%
>>   jdk            |    0.26%   |   5.64%
>>   linux          |    0.01%   |   0.51%
>>   llvm-project   |    0.12%   |   0.03%
>>   rails          |    0.10%   |   0.10%
>>   rust           |    0.07%   |   0.17%
>>   tensorflow     |    0.09%   |   1.02%
>>   webkit         |    0.05%   |   0.31%
>
> This is very useful information to have! Without the total number of
> commits, it's impossible to know whether or not this is a win over the
> BFXL chunk. But, since the number of commits is probably "large" versus
> the percentage of boundary commits which is "small", it's almost
> certainly an advantage.

Do you want to include it in either the log message in one of the
commits, in code comment, or a technical doc?

> So, I think that if it's truly misleading, we could revisit this after
> the topic is merged. But, I'm not planning on changing anything at this
> point.

If you do not want to help us go the last-mile to completion, that
is sad, but I do not want to see a basically good patch stall like
that, so let's find somebody else who can do the helping ;-)

Here is what my trial rebase produced.  I'll queue it to 'seen' (if
you prefer I can send a full v3 patch, but I expect that you know
how to fetch from 'seen' and review locally) after checking if the
result passes the tests locally; an extra set of eyeballs to verify
the result is pretty much appreciated.

Thanks.


1:  ca6c060171 ! 1:  9b294a0c66 bloom: encode out-of-bounds filters as non-empty
    @@ Commit message
         efficient than storing a bitmap for which commits haven't computed their
         Bloom filters. Storing a bitmap incurs a penalty of one bit per commit,
         whereas storing explicit filters as above incurs a penalty of one byte
    -    per too-large or too-small commit.
    +    per too-large or empty commit.
     
         In practice, these boundary commits likely occupy a small proportion of
         the overall number of commits, and so the size penalty is likely smaller
    @@ bloom.c: struct bloom_filter *get_or_compute_bloom_filter(struct repository *r,
      		filter->len = (hashmap_get_size(&pathmap) * settings->bits_per_entry + BITS_PER_WORD - 1) / BITS_PER_WORD;
     +		if (!filter->len) {
     +			if (computed)
    -+				*computed |= BLOOM_TRUNC_SMALL;
    ++				*computed |= BLOOM_TRUNC_EMPTY;
     +			filter->len = 1;
     +		}
      		filter->data = xcalloc(filter->len, sizeof(unsigned char));
    @@ bloom.h: enum bloom_filter_computed {
      	BLOOM_NOT_COMPUTED = (1 << 0),
      	BLOOM_COMPUTED     = (1 << 1),
      	BLOOM_TRUNC_LARGE  = (1 << 2),
    -+	BLOOM_TRUNC_SMALL  = (1 << 3),
    ++	BLOOM_TRUNC_EMPTY  = (1 << 3),
      };
      
      struct bloom_filter *get_or_compute_bloom_filter(struct repository *r,
    @@ commit-graph.c: struct write_commit_graph_context {
      
      	int count_bloom_filter_computed;
      	int count_bloom_filter_not_computed;
    -+	int count_bloom_filter_trunc_small;
    ++	int count_bloom_filter_trunc_empty;
      	int count_bloom_filter_trunc_large;
      };
      
    @@ commit-graph.c: static void trace2_bloom_filter_write_statistics(struct write_co
      			   ctx->count_bloom_filter_computed);
      	trace2_data_intmax("commit-graph", ctx->r, "filter-not-computed",
      			   ctx->count_bloom_filter_not_computed);
    -+	trace2_data_intmax("commit-graph", ctx->r, "filter-trunc-small",
    -+			   ctx->count_bloom_filter_trunc_small);
    ++	trace2_data_intmax("commit-graph", ctx->r, "filter-trunc-empty",
    ++			   ctx->count_bloom_filter_trunc_empty);
      	trace2_data_intmax("commit-graph", ctx->r, "filter-trunc-large",
      			   ctx->count_bloom_filter_trunc_large);
      }
    @@ commit-graph.c: static void compute_bloom_filters(struct write_commit_graph_cont
      			&computed);
      		if (computed & BLOOM_COMPUTED) {
      			ctx->count_bloom_filter_computed++;
    -+			if (computed & BLOOM_TRUNC_SMALL)
    -+				ctx->count_bloom_filter_trunc_small++;
    ++			if (computed & BLOOM_TRUNC_EMPTY)
    ++				ctx->count_bloom_filter_trunc_empty++;
      			if (computed & BLOOM_TRUNC_LARGE)
      				ctx->count_bloom_filter_trunc_large++;
      		} else if (computed & BLOOM_NOT_COMPUTED)
    @@ t/t4216-log-bloom.sh: test_max_changed_paths () {
      	grep "\"key\":\"filter-computed\",\"value\":\"$1\"" $2
      }
      
    -+test_filter_trunc_small () {
    -+	grep "\"key\":\"filter-trunc-small\",\"value\":\"$1\"" $2
    ++test_filter_trunc_empty () {
    ++	grep "\"key\":\"filter-trunc-empty\",\"value\":\"$1\"" $2
     +}
     +
      test_filter_trunc_large () {
    @@ t/t4216-log-bloom.sh: test_expect_success 'correctly report changes over limit'
      '
      
     +test_expect_success 'correctly report commits with no changed paths' '
    -+	git init small &&
    -+	test_when_finished "rm -fr small" &&
    ++	git init empty &&
    ++	test_when_finished "rm -fr empty" &&
     +	(
    -+		cd small &&
    ++		cd empty &&
     +
     +		git commit --allow-empty -m "initial commit" &&
     +
    @@ t/t4216-log-bloom.sh: test_expect_success 'correctly report changes over limit'
     +			git commit-graph write --reachable --changed-paths &&
     +		test_filter_computed 1 trace.event &&
     +		test_filter_not_computed 0 trace.event &&
    -+		test_filter_trunc_small 1 trace.event &&
    ++		test_filter_trunc_empty 1 trace.event &&
     +		test_filter_trunc_large 0 trace.event
     +	)
     +'
2:  11db600d51 = 2:  1b4c861e68 commit-graph: rename 'split_commit_graph_opts'
3:  cf49598137 ! 3:  d6c1bd395e builtin/commit-graph.c: introduce '--max-new-filters=<n>'
    @@ t/t4216-log-bloom.sh: test_expect_success 'correctly report commits with no chan
     +
     +		test_filter_computed 2 trace.event &&
     +		test_filter_not_computed 3 trace.event &&
    -+		test_filter_trunc_small 0 trace.event &&
    ++		test_filter_trunc_empty 0 trace.event &&
     +		test_filter_trunc_large 0 trace.event
     +	)
     +'
    @@ t/t4216-log-bloom.sh: test_expect_success 'correctly report commits with no chan
     +				--split=replace --max-new-filters=1 &&
     +		test_filter_computed 1 trace.event &&
     +		test_filter_not_computed 4 trace.event &&
    -+		test_filter_trunc_small 0 trace.event &&
    ++		test_filter_trunc_empty 0 trace.event &&
     +		test_filter_trunc_large 0 trace.event
     +	)
     +'
    @@ t/t4216-log-bloom.sh: test_expect_success 'correctly report commits with no chan
     +					--changed-paths --max-new-filters=2 &&
     +			test_filter_computed 2 trace.event &&
     +			test_filter_not_computed 4 trace.event &&
    -+			test_filter_trunc_small 2 trace.event &&
    ++			test_filter_trunc_empty 2 trace.event &&
     +			test_filter_trunc_large 0 trace.event
     +		done &&
     +
    @@ t/t4216-log-bloom.sh: test_expect_success 'correctly report commits with no chan
     +				--changed-paths --max-new-filters=2 &&
     +		test_filter_computed 0 trace.event &&
     +		test_filter_not_computed 6 trace.event &&
    -+		test_filter_trunc_small 0 trace.event &&
    ++		test_filter_trunc_empty 0 trace.event &&
     +		test_filter_trunc_large 0 trace.event
     +	)
     +'
4:  1bc82cd008 ! 4:  b0d51fb04a commit-graph: introduce 'commitGraph.maxNewFilters'
    @@ t/t4216-log-bloom.sh: test_expect_success 'Bloom generation is limited by --max-
     +				--split=replace &&
      		test_filter_computed 1 trace.event &&
      		test_filter_not_computed 4 trace.event &&
    - 		test_filter_trunc_small 0 trace.event &&
    + 		test_filter_trunc_empty 0 trace.event &&
     @@ t/t4216-log-bloom.sh: test_expect_success 'Bloom generation backfills previously-skipped filters' '
      	)
      '
    @@ t/t4216-log-bloom.sh: test_expect_success 'Bloom generation backfills previously
     +				--max-new-filters=1 &&
     +		test_filter_computed 1 trace.event &&
     +		test_filter_not_computed 1 trace.event &&
    -+		test_filter_trunc_small 0 trace.event &&
    ++		test_filter_trunc_empty 0 trace.event &&
     +		test_filter_trunc_large 0 trace.event
     +	)
     +'



[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux