Re: [PATCH 8/9] revision.c: use bloom filters to speed up path based revision walks

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 1/10/2020 7:27 PM, Jakub Narebski wrote:
> "Garima Singh via GitGitGadget" <gitgitgadget@xxxxxxxxx> writes:
>>  		case REV_TREE_SAME:
>>  			if (!revs->simplify_history || !relevant_commit(p)) {
>>  				/* Even if a merge with an uninteresting
>> @@ -3342,6 +3376,33 @@ static void expand_topo_walk(struct rev_info *revs, struct commit *commit)
>>  	}
>>  }
>>  
>> +static void prepare_to_use_bloom_filter(struct rev_info *revs)
> 
> All right, I see that pointers to bloom_key and bloom_filter_settings
> were added to the rev_info struct.  I understand why the former is here,
> but the latter seems to be there just as a shortcut (to not owned data),
> which is fine but a bit strange.
> 
> Or is the latter here to allow for Bloom filter settings to possibly
> change from commit-graph file in the chain to commit-graph file, and
> thus from commit to commit?
> 

The latter. The idea is to keep the implementation open to that 
possibility. Load the bloom filter settings from the commit graph file 
you are dealing with and use those settings to fill out the bloom_key.

>> +	const char *path;
>> +	size_t len;
>> +
>> +	if (!revs->commits)
>> +	    return;
> 
> When revs->commits may be NULL?  I understand that we need to have this
> check because we use revs->commits->item next (sidenote: can revs ever
> be NULL?).
> 
> Would `git log --all -- <path>` use Bloom filters (as it theoretically
> could)?
> 

I am being defensive about revs->commits being NULL for the reason you 
called out. 

And yes, `git log --all <path>` does use Bloom filters if provided a 
single pathspec and when following the first parent.  

>> +
>> +	if (!revs->repo->objects->commit_graph)
>> +		return;
>> +
>> +	revs->bloom_filter_settings = revs->repo->objects->commit_graph->settings;
>> +	if (!revs->bloom_filter_settings)
>> +		return;
> 
> All right, so if there is no commit graph, or the commit graph does not
> include Bloom filter data, there is nothing to do.
> 
> Though I worry that it would make Git do not use Bloom filter if the top
> commit-graph in the chain does not include Bloom filter data, while
> other commit-graph files do (and Git could have used that information to
> speed up the file history query).
> 

Thanks for bringing this up! I need to test this scenario out more. 

>> +
>> +printf "c7\nc4\nc1" > expect_file1
> 
> Doing things outside test is discouraged.  We can create a separate test
> that creates those expect_file* files, or it can be a part of 'create
> commits' test.
> 
> Anyway, instead of doing test without Bloom filters (something that
> should have been tested already by other parts of testsuite), and then
> doing the same test with Bloom filter, why not compare that the result
> without and with Bloom filter is the same.  The t5318-commit-graph.sh
> test does this with help of graph_git_two_modes() function:
> 
>   graph_git_two_modes () {
>   	git -c core.commitGraph=true  $1 >output
>   	git -c core.commitGraph=false $1 >expect
>   	test_cmp expect output
>   }
> 
> Sidenote: I wonder if it is high time to create t/lib-commit-graph.sh
> helper with, among others, this common function.
>

Thank you! I have restructured the test to do almost everything you 
have suggested. 

Also, a note for this v1 RFC series: I am working on proper formal tests 
right now. I didn't want to wait to get these nailed down before sending 
out the RFC series and getting the ball rolling. 

I have taken note of all the testing suggestions you have made in your 
review. They are very helpful and I appreciate it! 

Creating t/lib-commit-graph.sh helper would be orthogonal to this series,
so I will follow up with a separate series that does this. 

Cheers! 
Garima Singh



[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux