Re: [RFC][GSoC] Implement Generation Number v2

Jakub Narebski <jnareb@xxxxxxxxx> · Tue, 24 Mar 2020 16:44:28 +0100

Junio C Hamano <gitster@xxxxxxxxx> writes:
> Jakub Narebski <jnareb@xxxxxxxxx> writes:
>
>> About moving commit data with generation number v2 to "CDA2" chunk: if
>> "CDAT" chunk is missing then (I think) old Git would simply not use
>> commit-graph file at all; it may crash, but I don't think so.  If "CDAT"
>> chunk has zero length... I don't know what would happen then, possibly
>> also old Git would simply not use commit-graph data at all.
>
> Yeah, if it makes it crash, then we cannot use that "missing CDAT"
> approach.

I have not tested this, but from reading the code it looks like "missing
CDAT" makes Git fail softly -- it would return NULL for the
commit-graph, and thus not use commit-graph data at all... which might
be too high a price (too high performance penalty for old Git).

>> Putting generation number v2 into separate chunk (which might be called
>> "GEN2" or "OFFS"/"DOFF") has the disadvantage of increasing the on disk
>> size of the commit graph, and possibly also increasing memory
>> consumption (the latter depends on how it would be handled), but has the
>> advantage of being fullly backward compatibile.  Old Git would simply
>> use generation numbers v1 in "CDAT", new Git would use generation
>> numbers v2 in "OFFS" -- combining commit creation date from "CDAT" and
>> offset from "OFFS"),
>
> Do we have an option *not* to record meaningful generation numbers
> in CDAT and have the current Git binaries understand and still use
> the rest of the graph file, while not using the optimizations that
> rely on having generation numbers?  If not, then the new version of
> Git that tries to be compatible with old one needs to compute both
> generation numbers, and we would need to keep the topological number
> for quite some time.

We can, as Derrick Stolee wrote, put zero (GENERATION_NUMBER_ZERO) for
generation number.  Without generation number data we lose some of
performance improvements, though.

On the other hand computing generation number v1 (topological level) and
generation number v2 ([monotonic] offset for corrected commit date)
should not be much more costly than calculating single generation
number, assuming that most of the cost is walking the commit graph.  But
this would need benchmarking.

Also, as Stolee wrote, with generation number v2 in separate chunk we
have commit data not together, but split into two areas.

>> and there should be no problems with updating
>> commit-graph file (either rewriting, or adding new commit-graph to the
>> chain).
>
> Would merging by the current Git also work well (meaning, would
> "GEN2" or whatever it does not understand be omitted)?

>From the analysis of write_commit_graph_file(), it looks like unknown
chunks are simply skipped (ommitted), but I have not checked it in
practice.

Best,
-- 
Jakub Narębski