"Garima Singh via GitGitGadget" <gitgitgadget@xxxxxxxxx> writes: This is a second part of my response, focusing solely on tests of the Bloom filters feature. [...] > diff --git a/t/helper/test-read-graph.c b/t/helper/test-read-graph.c > index d2884efe0a..aff597c7a3 100644 > --- a/t/helper/test-read-graph.c > +++ b/t/helper/test-read-graph.c > @@ -45,6 +45,10 @@ int cmd__read_graph(int argc, const char **argv) > printf(" commit_metadata"); > if (graph->chunk_extra_edges) > printf(" extra_edges"); > + if (graph->chunk_bloom_indexes) > + printf(" bloom_indexes"); > + if (graph->chunk_bloom_data) > + printf(" bloom_data"); > printf("\n"); > All right, that is simple extension of 'test-helper read-graph'. > UNLEAK(graph); > diff --git a/t/t4216-log-bloom.sh b/t/t4216-log-bloom.sh > new file mode 100755 > index 0000000000..19eca1864b > --- /dev/null > +++ b/t/t4216-log-bloom.sh > @@ -0,0 +1,140 @@ > +#!/bin/sh > + > +test_description='git log for a path with bloom filters' > +. ./test-lib.sh > + > +test_expect_success 'setup test - repo, commits, commit graph, log outputs' ' > + git init && > + mkdir A A/B A/B/C && > + test_commit c1 A/file1 && > + test_commit c2 A/B/file2 && > + test_commit c3 A/B/C/file3 && > + test_commit c4 A/file1 && > + test_commit c5 A/B/file2 && > + test_commit c6 A/B/C/file3 && > + test_commit c7 A/file1 && > + test_commit c8 A/B/file2 && > + test_commit c9 A/B/C/file3 && > + git checkout -b side HEAD~4 && > + test_commit side-1 file4 && > + git checkout master && > + git merge side && > + test_commit c10 file5 && Unfortunately this might be not enough for Git's heuristic similarity based rename detection, as it creates 'file5' file with content 'c10'. [Checking something]. Well, actually it looks like it works, even with not much contents. I thought you would need to use something like + test_write_lines 1 2 3 4 5 6 7 8 9 >file5 && + git add file5 && + git commit -m c10 && But it turns out that it is, s far as I have checked, not necessary. > + mv file5 file5_renamed && > + git add file5_renamed && > + git commit -m "rename" && > + git commit-graph write --reachable --changed-paths > +' Hmmm... there is no test for file that was present in history but got deleted. Might be important (because of pre-image vs post-image name issues). Very minor issue: following the style used in t/test-lib-functions.sh and the style guide in CodingGuidelines, it should be +graph_read_expect () { and the same for the following functions. https://github.com/git/git/blob/master/Documentation/CodingGuidelines#L144 - We prefer a space between the function name and the parentheses, and no space inside the parentheses. The opening "{" should also be on the same line. (incorrect) my_function(){ ... (correct) my_function () { ... > +graph_read_expect() { > + OPTIONAL="" > + NUM_CHUNKS=5 > + cat >expect <<- EOF > + header: 43475048 1 1 $NUM_CHUNKS 0 > + num_commits: $1 > + chunks: oid_fanout oid_lookup commit_metadata bloom_indexes bloom_data Either OPTIONAL remains unused, and should be removed, or we leave it for possible future extension, and we write + chunks: oid_fanout oid_lookup commit_metadata bloom_indexes bloom_data$OPTIONAL like in t/t5318-commit-graph.sh. > + EOF > + test-tool read-graph >output && > + test_cmp expect output Why 'output', and not 'actual'? > +} > + > +test_expect_success 'commit-graph write wrote out the bloom chunks' ' > + graph_read_expect 13 > +' All right, that is sanity-checking 'git commit-graph write --changed-paths'. > + > +setup() { I wonder if we can come up with a better name... setup_log(), setup_log_bloom(), log_compare()? > + rm output This shouldn't be here, in this function. Or perhaps it shouldn't even be used at all; having 'output' doesn't hinder anything. > + rm "$TRASH_DIRECTORY/trace.perf" All right, this cleanup is needed. > + git -c core.commitGraph=false log --pretty="format:%s" $1 >log_wo_bloom > + GIT_TRACE2_PERF="$TRASH_DIRECTORY/trace.perf" git -c core.commitGraph=true log --pretty="format:%s" $1 >log_w_bloom All right, we prepare for comparing version without Bloom filters (reference) and with Bloom filters, and for checking if Bloom filters were used. > +} This setup() function above is missing the && chain. It should then in my opinion read: +setup () { + rm "$TRASH_DIRECTORY/trace.perf" && + git -c core.commitGraph=false log --format="%s" $1 >log_wo_bloom && + GIT_TRACE2_PERF="$TRASH_DIRECTORY/trace.perf" \ + git -c core.commitGraph=true log --format="%s" $1 >log_w_bloom +} Also, perhaps we should add at the beginning of this test file, outside anu test_expect_success block, the following (see t/*trace2*.sh files): # Turn off any inherited trace2 settings for this test. sane_unset GIT_TRACE2 GIT_TRACE2_PERF GIT_TRACE2_EVENT sane_unset GIT_TRACE2_PERF_BRIEF sane_unset GIT_TRACE2_CONFIG_PARAMS > + > +test_bloom_filters_used() { > + log_args=$1 > + bloom_trace_prefix="statistics:{\"filter_not_present\":0,\"zero_length_filter\":0,\"maybe\"" > + setup "$log_args" Missing && chain. > + grep -q "$bloom_trace_prefix" "$TRASH_DIRECTORY/trace.perf" && test_cmp log_wo_bloom log_w_bloom Why no line break after &&? > +} Ugh, examining JSON output with regexp is in my opinion quite fragile. Though I am not sure if requiring Perl and JSON module installed like t/t0212-trace2-event.sh is any better. > + > +test_bloom_filters_not_used() { > + log_args=$1 > + setup "$log_args" > + !(grep -q "statistics:{\"filter_not_present\":" "$TRASH_DIRECTORY/trace.perf") && test_cmp log_wo_bloom log_w_bloom We should also check that "$TRASH_DIRECTORY/trace.perf" file exist with test_path_is_file. Also, testing that something was not found is a bit fragile, but I don't have any better idea on how to do this test without negating grep exit value. > +} > + > +for path in A A/B A/B/C A/file1 A/B/file2 A/B/C/file3 file4 file5_renamed NOTE: file5 is missing from this list! I suspect that adding it might cause the test to fail. > +do > + for option in "" \ > + "--full-history" \ > + "--full-history --simplify-merges" \ > + "--simplify-merges" \ > + "--simplify-by-decoration" \ > + "--follow" \ > + "--first-parent" \ > + "--topo-order" \ > + "--date-order" \ > + "--author-date-order" \ > + "--ancestry-path side..master" > + do > + test_expect_success "git log option: $option for path: $path" ' > + test_bloom_filters_used "$option -- $path" All right, this tests that Bloom filters were used *and* that the command run with Bloom filters and without Bloom filters (without commit-graph) produces the same output. > + ' > + done > +done > + > +test_expect_success 'git log -- folder works with and without the trailing slash' ' > + test_bloom_filters_used "-- A" && > + test_bloom_filters_used "-- A/" > +' All right. I wonder if we should test for insane test case, namely pathname to an ordinary file that ends with slash: + test_bloom_filters_used "-- file4" && + test_bloom_filters_used "-- file4/" The latter should produce no output, being treated as not existing file. > + > +test_expect_success 'git log for path that does not exist. ' ' > + test_bloom_filters_used "-- path_does_not_exist" > +' All right. > + > +test_expect_success 'git log with --walk-reflogs does not use bloom filters' ' > + test_bloom_filters_not_used "--walk-reflogs -- A" > +' All right, but why is it so? > + > +test_expect_success 'git log -- multiple path specs does not use bloom filters' ' > + test_bloom_filters_not_used "-- file4 A/file1" > +' All right, though this is limitation of current code, not limitation of technique, so _maybe_ it would be better to test_expect_failure that for multiple pathspecs bloom_filters_used... > + > +test_expect_success 'git log with wildcard that resolves to a single path uses bloom filters' ' > + test_bloom_filters_used "-- *4" && > + test_bloom_filters_used "-- *renamed" > +' > + > +test_expect_success 'git log with wildcard that resolves to a multiple paths does not uses bloom filters' ' > + test_bloom_filters_not_used "-- *" && > + test_bloom_filters_not_used "-- file*" > +' Same here. > + > +test_expect_success 'setup - add commit-graph to the chain without bloom filters' ' > + test_commit c14 A/anotherFile2 && > + test_commit c15 A/B/anotherFile2 && > + test_commit c16 A/B/C/anotherFile2 && > + GIT_TEST_COMMIT_GRAPH_CHANGED_PATHS=0 git commit-graph write --reachable --split && > + test_line_count = 2 .git/objects/info/commit-graphs/commit-graph-chain > +' > + > +test_expect_success 'git log does not use bloom filters if the latest graph does not have bloom filters.' ' > + test_bloom_filters_not_used "-- A/B" > +' All right... though I would try to come up with a shorter test name :-) > + > +test_expect_success 'setup - add commit-graph to the chain with bloom filters' ' > + test_commit c17 A/anotherFile3 && > + git commit-graph write --reachable --changed-paths --split && > + test_line_count = 3 .git/objects/info/commit-graphs/commit-graph-chain > +' > + > +test_bloom_filters_used_when_some_filters_are_missing() { > + log_args=$1 > + bloom_trace_prefix="statistics:{\"filter_not_present\":3,\"zero_length_filter\":0,\"maybe\":6,\"definitely_not\":6" Perhaps a better solution would be to use (enhanced) 'test-tool bloom' to check which commits have Bloom filters and which do not. > + setup "$log_args" > + grep -q "$bloom_trace_prefix" "$TRASH_DIRECTORY/trace.perf" && test_cmp log_wo_bloom log_w_bloom > +} Why broken && chain between setup() and the resr, and why && is not followed by line break (as before)? > + > +test_expect_success 'git log uses bloom filters if they exist in the latest but not all commit graphs in the chain.' ' > + test_bloom_filters_used_when_some_filters_are_missing "-- A/B" > +' > + > +test_done All right... though the description of this test is a bit long. Thank you for your work on this series. Best, -- Jakub Narębski