Jakub Narebski <jnareb@xxxxxxxxx> writes: > A few questions: > - is it too late to propose a new project idea for GSoC 2020? > - is it too difficult of a project for GSoC? > ... > ### Graph labelling for speeding up git commands > > - Language: C > - Difficulty: hard / difficult > - Possible mentors: Jakub Narębski I am not running the GSoC or participating in it in any way other than just being a reviewer-maintainer of the project, but I would appreciate a well-thought-out write-up very much. > Git uses various clever methods for making operations on very large > repositories faster, from bitmap indices for git-fetch[1], to generation > numbers (also known as topological levels) in the commit-graph file for > commit graph traversal operations like `git log --graph`[2]. > > One possible improvement that can make Git even faster is using min-post > intervals labelling. The basis of this labelling is post-visit order of > a depth-first search traversal tree of a commit graph, let's call it > 'post(v)'. > > If for each commit 'v' we would compute and store in the commit-graph > file two numbers: 'post(v)' and the minimum of 'post(u)' for all commits > reachable from 'v', let's call the latter 'min_graph(v)', then the > following condition is true: > > if 'v' can reach 'u', then min_graph(v) <= post(u) <= post(v) > > If for each commit 'v' we would compute and store in the commit-graph > file two numbers: 'post(v)' and the minimum of 'post(u)' for commits > that were visited during the part of depth-first search that started > from 'v' (which is the minimum of post-order number for subtree of a > spanning tree that starts at 'v'). Let's call the later 'min_tree(v)'. > Then the following condition is true: > > if min_tree(v) <= post(u) <= post(v), then 'v' can reach 'u' > > The task would be to implement computing such labelling (or a more > involved variant of it[3][4]), storing it in commit-graph file, and > using it for speeding up git commands (starting from a single chosen > command) such as: > > - git merge-base --is-ancestor A B > - git branch --contains A > - git tag --contains A > - git branch --merged A > - git tag --merged A > - git merge-base --all A B > - git log --topo-sort > > References: > > 1. <http://githubengineering.com/counting-objects/> > 2. <https://devblogs.microsoft.com/devops/supercharging-the-git-commit-graph-iii-generations/> > 3. <https://arxiv.org/abs/1404.4465> > 4. <https://github.com/steps/Ferrari> > > See also discussion in: > > <https://public-inbox.org/git/86tvl0zhos.fsf@xxxxxxxxx/t/>