Garima Singh <garimasigit@xxxxxxxxx> writes: > On 2/17/2020 4:56 PM, Jakub Narebski wrote: >> "Garima Singh via GitGitGadget" <gitgitgadget@xxxxxxxxx> writes: [...] >>> --- >>> commit-graph.c | 32 +++++++++++++++++++++++++++++++- >>> commit-graph.h | 3 ++- >>> 2 files changed, 33 insertions(+), 2 deletions(-) >> >> It would be good to have at least sanity check of this feature, perhaps >> one that would check that the number of per-commit Bloom filters on slab >> matches the number of commits in the commit-graph. > > The combination of all the e2e tests in this series with the test > flag being turned on in the CI, and the performance gains we are seeing > confirm that this is happening correctly. Well, the advantage of unit tests over e2e functional tests is that they can pinpoint the source of bug much more easily. That said, I don't think there is absolute need for unit tests here, though it would be nice to have them. >>> >>> const struct split_commit_graph_opts *split_opts; >>> + uint32_t total_bloom_filter_data_size; >> >> This is total size of Bloom filters data, in bytes, that will later be >> used for BDAT chunk size. However the commit-graph format uses 8 bytes >> for byte-offset, not 4 bytes. Why it is uint32_t and not uint64_t then? > > Changed to size_t. Thanks for noticing! Right, this is a local value (size_t may be different size on different architectures), even though it will be stored indirectly in chunk lookup table as pair of uint64_t offsets. >>> }; >>> >>> static void write_graph_chunk_fanout(struct hashfile *f, >>> @@ -1140,6 +1143,28 @@ static void compute_generation_numbers(struct write_commit_graph_context *ctx) >>> stop_progress(&ctx->progress); >>> } >>> >>> +static void compute_bloom_filters(struct write_commit_graph_context *ctx) >>> +{ >>> + int i; >>> + struct progress *progress = NULL; >>> + >>> + load_bloom_filters(); >>> + >>> + if (ctx->report_progress) >>> + progress = start_progress( >>> + _("Computing commit diff Bloom filters"), >>> + ctx->commits.nr); >>> + >> >> Shouldn't we initialize ctx->total_bloom_filter_data_size to 0 here? We >> cannot use compute_bloom_filters() to _update_ Bloom filters data, I >> think -- we don't distinguish here between new and existing data (where >> existing data size is already included in total Bloom filters size). At >> least I don't think so. >> > > This line in commit-graph.c takes care of reinitializing the graph context and > by consequence the bloom filter data size. > > ctx = xcalloc(1, sizeof(struct write_commit_graph_context)); > > So the total size gets recalculated every time, which is correct. True, I have missed this. Best, -- Jakub Narębski