Re: [PATCH v4 00/13] Serialized Git Commit Graph

Stefan Beller <sbeller@xxxxxxxxxx> · Mon, 2 Apr 2018 11:02:49 -0700

> Currently, the format includes 8 bytes to share between the generation
> number and commit date. Due to alignment concerns, we will want to keep this
> as 8 bytes or truncate it to 4-bytes. Either we would be wasting at least 3
> bytes or truncating dates too much (presenting the 2038 problem [1] since
> dates are signed).

Good point. I forgot about them while writing the previous email.
That is reason enough to keep the generation numbers, sorry
for the noise.

>
>> I only glanced at the paper, but it looks like a "more advanced 2d
>> generation number" that seems to be able to answer questions
>> that gen numbers can answer, but that paper also refers
>> to SCARAB as well as GRAIL as the state of the art, so maybe
>> there are even more papers to explore?
>
>
> The biggest reason I can say to advance this series (and the small follow-up
> series that computes and consumes generation numbers) is that generation
> numbers are _extremely simple_. You only need to know your parents and their
> generation numbers to compute your own. These other reachability indexes
> require examining the entire graph to create "good" index values.

Yes, that is a good point, too. Generation numbers can be computed
"commit locally" and do not need expensive setups, which the others
presumably need.

> The hard part about using generation numbers (or any other reachability
> index) in Git is refactoring the revision-walk machinery to take advantage
> of them; current code requires O(reachable commits) to topo-order instead of
> O(commits that will be output). I think we should table any discussion of
> these advanced indexes until that work is done and a valuable comparison can
> be done. "Premature optimization is the root of all evil" and all that.

agreed,

Stefan