Re: [PATCH v4 00/13] Serialized Git Commit Graph

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 4/2/2018 1:35 PM, Stefan Beller wrote:
On Mon, Apr 2, 2018 at 8:02 AM, Derrick Stolee <stolee@xxxxxxxxx> wrote:
I would be happy to review any effort to extend the commit-graph
format to include such indexes, as long as the performance benefits
outweigh the complexity to create them.

[2]
http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.719.8396&rep=rep1&type=pdf
The complexity of calculating FELINE index is O(|V| log(|V|) + |E|), the
storage complexity is 2*|V|.

This would be very easy to add as an optional chunk, since it can use one
row per commit.
Given this discussion, I wonder if we want to include generation numbers
as a first class citizen in the current format. They could also go as
an optional
chunk and we may want to discuss further if we want generation numbers or
FELINE numbers or GRAIL or SCARAB, which are all graph related speedup
mechanism AFAICT.
In case we decide against generation numbers in the long run,
the row of mandatory generation numbers would be dead weight
that we'd need to carry.

Currently, the format includes 8 bytes to share between the generation number and commit date. Due to alignment concerns, we will want to keep this as 8 bytes or truncate it to 4-bytes. Either we would be wasting at least 3 bytes or truncating dates too much (presenting the 2038 problem [1] since dates are signed).

I only glanced at the paper, but it looks like a "more advanced 2d
generation number" that seems to be able to answer questions
that gen numbers can answer, but that paper also refers
to SCARAB as well as GRAIL as the state of the art, so maybe
there are even more papers to explore?

The biggest reason I can say to advance this series (and the small follow-up series that computes and consumes generation numbers) is that generation numbers are _extremely simple_. You only need to know your parents and their generation numbers to compute your own. These other reachability indexes require examining the entire graph to create "good" index values.

The hard part about using generation numbers (or any other reachability index) in Git is refactoring the revision-walk machinery to take advantage of them; current code requires O(reachable commits) to topo-order instead of O(commits that will be output). I think we should table any discussion of these advanced indexes until that work is done and a valuable comparison can be done. "Premature optimization is the root of all evil" and all that.

Thanks,
-Stolee

[1] https://en.wikipedia.org/wiki/Year_2038_problem



[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux