Re: [PATCH v3 09/11] commit-graph: use generation v2 only if entire chain does

jnareb@xxxxxxxxx (Jakub Narębski) · Wed, 26 Aug 2020 12:38:04 +0200

Hi Abhishek,

Abhishek Kumar <abhishekkumar8222@xxxxxxxxx> writes:
> On Sat, Aug 22, 2020 at 07:14:38PM +0200, Jakub Narębski wrote:
>> Hi Abhishek,
>>
>> ...
>>
>> However the commit message do not say anything about the *writing* side.
>>
>
> Revised the commit message to include the following at the end:
>
> When writing the new layer in split commit-graph, we write a GDAT chunk
> only if the topmost layer has a GDAT chunk. This guarantees that if a
> layer has GDAT chunk, all lower layers must have a GDAT chunk as well.
>

All right.

> Rewriting layers follows similar approach: if the topmost layer below
> set of layers being rewritten (in the split commit-graph chain) exists,
> and it does not contain GDAT chunk, then the result of rewrite does not
> have GDAT chunks either.

All right.

I see that you went with proposed more complex (but better) solution...

>>
>> ...
>>
>> To be more detailed, without '--split=replace' we would want the following
>> layer merging behavior:
>>
>>    [layer with GDAT][with GDAT][without GDAT][without GDAT][without GDAT]
>>            1              2           3             4            5
>>
>> In the split commit-graph chain above, merging two topmost layers
>> (layers 4 and 5) should create a layer without GDAT; merging three
>> topmost layers (and any other layers, e.g. two middle ones, i.e. 3 and
>> 4) should create a new layer with GDAT.

A simpler solution would be to create a new merged layer without GDAT if
any of the layers being merged do not have GDAT.

In this solution merging 3+4+5, 3+4, and even 2+3 would result with
layer without GDAT, and only merging 1+2 would result in layer with GDAT.

>>
>>    [layer with GDAT][with GDAT][without GDAT][-------without GDAT-------]
>>            1              2           3               merged
>>
>>    [layer with GDAT][with GDAT][-------------with GDAT------------------]
>>            1              2                    merged
>>
>> I hope those ASCII-art pictures help understanding it
>>
>
> Thanks! There were helpful.
>
> While we work as expected in the first scenario i.e merging 4 and 5, we
> would *still* write a layer without GDAT in the second scenario.
>
> I have tweaked split_graph_merge_strategy() to fix this:
>
> ----------------------------------------------
>
> diff --git a/commit-graph.c b/commit-graph.c
> index 6d54d9a286..246fad030d 100644
> --- a/commit-graph.c
> +++ b/commit-graph.c
> @@ -1973,6 +1973,9 @@ static void split_graph_merge_strategy(struct write_commit_graph_context *ctx)
>  		}
>  	}
>
> +	if (!ctx->write_generation_data && g->chunk_generation_data)
> +		ctx->write_generation_data = 1;
> +
>  	if (flags != COMMIT_GRAPH_SPLIT_REPLACE)
>  		ctx->new_base_graph = g;
>  	else if (ctx->num_commit_graphs_after != 1)

...which turned out to be not that complicated.  Nice work!

Though this needs tests that if fulfills the stated condition (because I
am not sure if it is entirely correct: we are not checking the layer
below current one, isn't it?... ah, you explain it below).

One possible solution would be to grep `test-tool read-graph` output for
"^chunks: ", then pass it through `uniq` (without `sort`!), check that
the number of lines is less or equal 2, and if there are two lines then
check that we get the following contents:

  chunks: oid_fanout oid_lookup commit_metadata generation_data
  chunks: oid_fanout oid_lookup commit_metadata

(assuming that information about layers is added in top-down order).

This test must be run with GIT_TEST_COMMIT_GRAPH_CHANGED_PATHS=0, which
I think is the default.

> ----------------------------------------------------
>
> That is, if we were not writing generation data (because of mixed
> generation concerns) but the new topmost layer has a generation data
> chunk, we have merged all layers without GDAT chunk and can now write a
> GDAT chunk safely.

All right.

[...]
>>> diff --git a/commit-graph.h b/commit-graph.h
>>> index f78c892fc0..3cf89d895d 100644
>>> --- a/commit-graph.h
>>> +++ b/commit-graph.h
>>> @@ -63,6 +63,7 @@ struct commit_graph {
>>>  	struct object_directory *odb;
>>>
>>>  	uint32_t num_commits_in_base;
>>> +	uint32_t read_generation_data;
>>>  	struct commit_graph *base_graph;
>>>
>>
>> First, why `read_generation_data` is of uint32_t type, when it stores
>> (as far as I understand it), a "boolean" value of either 0 or 1?
>
> Yes, using unsigned int instead of uint32_t (although in most of cases
> it would be same).  If commit_graph had other flags as well, we could
> have used a bit field.

OK.

>> Second, couldn't we simply set chunk_generation_data to NULL?  Or would
>> that interfere with the case of rewriting, where we want to use existing
>> GDAT data when writing new commit-graph with GDAT chunk?
>
> It interferes with rewriting the split commit-graph, as you might have
> guessed from the above code snippet.

All right.

[...]
>>> @@ -885,6 +908,7 @@ void load_commit_graph_info(struct repository *r, struct commit *item)
>>>  	uint32_t pos;
>>>  	if (!prepare_commit_graph(r))
>>>  		return;
>>> +
>>>  	if (find_commit_in_graph(item, r->objects->commit_graph, &pos))
>>>  		fill_commit_graph_info(item, r->objects->commit_graph, pos);
>>>  }
>>
>> This is unrelated whitespace fix, a "while at it" in neighbourhood of
>> changes.  All right then.
>>
>
> Reverted this change, as it's unimportant.

Actually I am not against fixing the whitespace in the neighbourhood of
changes, so you can keep it or revert it (discard).

>>> @@ -2192,6 +2216,9 @@ int write_commit_graph(struct object_directory *odb,
>>
>> ...
>>
>> It would be nice to have an example with merging layers (whether we
>> would handle it in strict or relaxed way).
>>
>
> Sure, will add.

Thanks.

Best,
--
Jakub Narębski