Re: [PATCH 2/3] commit: write commit-graph with bloom filters

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 4/12/2020 4:51 PM, Taylor Blau wrote:
> On Sat, Apr 11, 2020 at 02:57:18PM -0700, Junio C Hamano wrote:
>> "Derrick Stolee via GitGitGadget" <gitgitgadget@xxxxxxxxx> writes:
>>
>>> diff --git a/builtin/commit.c b/builtin/commit.c
>>> index d3e7781e658..d2fdfdc4363 100644
>>> --- a/builtin/commit.c
>>> +++ b/builtin/commit.c
>>> @@ -1701,7 +1701,9 @@ int cmd_commit(int argc, const char **argv, const char *prefix)
>>>  		      "not exceeded, and then \"git restore --staged :/\" to recover."));
>>>
>>>  	if (git_env_bool(GIT_TEST_COMMIT_GRAPH, 0) &&
>>> -	    write_commit_graph_reachable(the_repository->objects->odb, 0, NULL))
>>> +	    write_commit_graph_reachable(the_repository->objects->odb,
>>> +					 git_env_bool(GIT_TEST_COMMIT_GRAPH_CHANGED_PATHS, 0) ? COMMIT_GRAPH_WRITE_BLOOM_FILTERS : 0,
>>> +					 NULL))
>>
>> Yuck.  That is doubly mouthful.
> 
> Yeah, this is quite a mouthful indeed. I think the most conservative fix
> would be something like:
> 
>   if (git_env_bool(GIT_TEST_COMMIT_GRAPH, 0)) {
>     enum commit_graph_write_flags flags = 0;
>     if (git_env_bool(GIT_TEST_COMMIT_GRAPH_CHANGED_PATHS, 0))
>       flags |= COMMIT_GRAPH_WRITE_BLOOM_FILTERS;
> 
>     if (write_commit_graph_reachable(the_repository->objects->odb,
>                                      flags, NULL)) {
>       /* ... */
>     }
>   }
> 
> and then ripping this knob out (as Junio suggests below, which I think
> is a wise suggestion) out would be straightforward.
> 
>> I wonder how much value we are getting by having this call here in
>> the first place.  This function being cmd_commit(), by definition we
>> won't be updating the graph file unless the test script does "git
>> commit".  We won't update the graph file upon "git rebase", "git
>> merge", "git pull", "git fetch", etc., so it is not like with this
>> hack, the test coverage for various traversal using the graph file
>> would magically be perfect.  We'd still need an explicit call to
>> "commit-graph write" at strategic places in the test scripts, no?
> 
> Yeah. It's definitely not a silver bullet for the reasons you mention,
> but I think that it is helping out in some common cases. For example,
> if we moved this check out to 'test_commit', then we'd be relying on
> tests to use that instead of 'git commit'. On the other hand, this
> catches either, since they both wind up in this builtin.
> 
> I guess you could push this check down much further to when we're about
> to write a new object, and--if that new object is a commit--update the
> commit-graph. That sounds awfully slow (and feels to me like a hack, but
> I can't justify whether it's more or less of a hack than we already
> have), but I think it would be OK, since this isn't the normal way to
> run tests.

If we keep this simple, or extract the process to a
"write_commit_graph_for_tests()" macro inside builtin.h, then we could
insert a commit-graph write in more places.

However, I think there is value in testing the "not every commit is
included in the commit-graph" case, which our current setup does quite
nicely. The one exception is that 'git merge' could benefit from this
snippet, so we cap off the commit-graph whenever a test script
"constructs" a repository to match a data shape.

>> How are we testing with and without bitmaps, and do we have a kludge
>> like this one for the feature, or do we use a different mechanism
>> to help writing tests easier to gain better coverage?
> 
> As far as I know after a brief search, we have no similar such mechanism
> for bitmaps, so commit-graphs are a unique instance.

I would not be against a similar mechanism for bitmaps, but this is
really important for commit-graphs. The difference lies in which users
have commit-graphs versus bitmaps. Now that commit-graphs are on by
default and written by 'git gc', client users get commit-graphs unless
they opt-out. We need to test as many scenarios as possible to avoid
causing them disruption.

Who uses bitmaps? Primarily Git servers. The scenarios run on those
repos are much more restricted, and those scenarios are tested with
bitmaps explicitly.

>> In any case, I can see why we want a separate knob specific to the
>> CHANGED_PATHS until the path filter part becomes as stable as the
>> rest of the graph file, but after some time, we should remove that
>> knob, and at that point, we always use the path filter whenever we
>> use commit-graph, so that we won't have to suffer from the ugliness.
>>
>> Wait.  I wonder if we can tweak the arrangement a bit so that this
>> layer does not need to know any more than GIT_TEST_COMMIT_GRAPH?
>>
>> For example, in commit-graph.c::write_commit_graph(), can we test
>> the CHANGED_PATHS environment variable and flip the .changed_paths
>> bit, no matter what the 'flags' argument to the function says?
> 
> It may be cleaner to have 'GIT_TEST_COMMIT_GRAPH' specify the flags that
> it wants as its value, but the additional coupling makes me somewhat
> uncomfortable.

It would be easy to insert it here: 

diff --git a/commit-graph.c b/commit-graph.c
index 77668629e2..3e8f36ce5c 100644
--- a/commit-graph.c
+++ b/commit-graph.c
@@ -1968,6 +1968,8 @@ int write_commit_graph(struct object_directory *odb,
        ctx->check_oids = flags & COMMIT_GRAPH_WRITE_CHECK_OIDS ? 1 : 0;
        ctx->split_opts = split_opts;
        ctx->changed_paths = flags & COMMIT_GRAPH_WRITE_BLOOM_FILTERS ? 1 : 0;
+       if (git_env_bool(GIT_TEST_COMMIT_GRAPH_CHANGED_PATHS, 0))
+               ctx->changed_paths = 1;
        ctx->total_bloom_filter_data_size = 0;
 
        if (ctx->split) {

The problem here is that if GIT_TEST_COMMIT_GRAPH_CHANGED_PATHS=1 and
someone runs "git commit-graph write --no-changed-paths" then the
negation of that option is ignored. But this is a GIT_TEST_* variable,
and any test that requires that check could disable the enviroment
variable first.

Thanks,
-Stolee



[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux