Abhishek Kumar <abhishekkumar8222@xxxxxxxxx> writes: > Greetings Jakub > >> So perhaps we should expand "Commit graph labeling for speeding up git >> commands" idea, splitting it into two possible ways of working in this >> project: the more technical 'Generation number v2', and 'Interval labels >> for commit graph' which is more of research project? Which should be >> put first, then? > > I would suggest working on generation number v2 first because: > - We ship improved performance *sooner*. > - I find it easier to shift from simpler to more complex systems. It would be also more of an engineering project, than research one. > On a personal note, I would be able to do a better job at working on generation > number v2 than on interval labels. While reading through the papers mentioned > was fun and full of a-ha moments, I find building things more > fulfilling. I would be glad if either you or Derrick opt for mentoring it. All right. I have added 'Generation number v2' as possible way of working on the "Commit graph labeling for speeding up git commands" idea. https://git.github.io/SoC-2020-Ideas/#commit-graph-labeling-for-speeding-up-git-commands > You could read my proposal at the link below. It is very rough and I haven't > proofread it yet. I will send out a more formal proposal once the direction > of this project is decided. > > https://github.com/abhishekkumar2718/GSoC20/blob/master/graph_labelling.md > > **Too long, didn't read** I will comment only on the TLDR version below. > - Commit graphs are small to medium-sized (compared to problem sizes observed in > graph-theory literature) sparse graphs. They have unusual properties compared > to more conventional graphs that can be exploited for better performance. Right. > - Most of git's reachability queries are negative and using a negative-cut > filter improves performance more than a positive-cut filter. Not exactly. Most time consuming git graph queries are those that return a subgraph or otherwise need to walk the commit graph (like topological sorting, or ahead/behind numbers). Here in most cases negative-cut filters, that reduce the number of commits walked, improve performance more (with rare exceptions, like --ancestry-path query). That is what I think, but it is not something I have proven... > - Implementing and maintaining a two-dimensional reachability index is hard > and does not offer justifiable performance gains. I think it would be better to not mention graph reachability algorithms at all, and just talk about generation number v2 as straight performance improvement over generation number v1. > - We plan to use corrected commit date as the generation number v2 because it is > locally computable, immutable and can be incrementally updated. Unfortunately because of an oopsie with respect to commit-graph file format versioning, at least for the time being generation number v2 needs to be also backward compatibile with generation number v1. See for example: https://git.github.io/rev_news/2019/06/28/edition-52/#reviews https://speakerdeck.com/jnareb/graph-operations-in-git-and-how-to-make-them-faster?slide=74 > > - If git ever considers a two-dimensional reachability index, either post order > DFS, GRAIL or an index based on commit date would be good starting > places to explore. > > I go in more details about GRAIL, FERRARI and PReaCH, explaining > briefly how they work, their advantages, and disadvantages. Again, I don't think that is necessary. Best, -- Jakub Narębski