Re: [GSoC] Blog post on reachability queries

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Abhishek Kumar <abhishekkumar8222@xxxxxxxxx> writes:

> Greetings everyone!
>
> I am working on implementing Generation Number v2. I have written an
> article about reachability queries, which I feel is necessary background
> for understanding the project.

Thank you for your introductory work.  It is always nice to have the
problem to be solved well described.

That said, I am not sure if all this theoretical introduction is needed
to understand the reachability labeling that Git is using, and is the
centerpoint of "Generation Number v2", namely the generation number or
topological level.

> Here's the summary of article:
>
>> Reachability refers to the ability to get from one vertex to another
>> within a graph.
>>
>> Reachability queries are an interesting problem, improving performance
>> for many graph operations. Better and more sophisticated solutions are
>> being created as the size of working graphs keeps increasing.

One caveat: the solution that works for the scale-free small-world graph
like citation network might not work for the commit graph, and vice
versa.  Also, not all algorithms for static graph would work well for
constantly growing history graph (for commit graph).

But it is also what makes it interesting, in my opinion.

>>
>> Reachability for the undirected graph can be found in linear
>> preprocessing and constant query time with disjoint set unions. The
>> answer isn't as evident for a directed graph because of differing
>> performance on positive and negative queries, nature and size of graph
>> and other factors. Topological Levels, Post Order DFS Intervals and
>> Contraction Hierarchies are some of the building blocks for such
>> algorithms.

All right.

>
> In a later article, I will talk about the specifics of generation number
> for Git. In particular, how Git uses reachability queries, the need for
> Generation Number v2 i.e., _Corrected Commit Date With Strictly Monotonic
> Offset_ and other interesting tidbits I come across.

Actually, _Corrected Commit Date With Strictly Monotonic Offset_ variant
of generation number v2 is needed to have *backward compatibility*.  It
turned out that there was a bug in implementing versioning for
serialized commit-graph format, namely that unsupported (newer) versions
of file make Git crash, instead of just not using commit-graph file to
speed up operations.

But it seems that there is a workaround. It happens that if commit graph
file is missing the CDAT chunk, then Git wouldn't use it to speed up
operations, but would not hard fail.  So if we put generation number v2
in a chunk with different name, e.g. CDA2, we would not need backward
compatibility.  Then simpler (and benchmarked) variant of _Corrected
Date_ would be enough.

There are advantages and disadvantages of each approach (assuming that
the second one actually works as described).  The first is more complex,
but backward compatibile with respect to reading (using) commit-graph.
The second is simpler, but old Git using new format would get
performance regression.

> You can find the article here:
>
> https://abhishekkumar2718.github.io/programming/2020/05/20/reachability-queries.html

Regards,
--
Jakub Narębski




[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux