Re: [RFC] Possible idea for GSoC 2020

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Greetings Jakub

> So perhaps we should expand "Commit graph labeling for speeding up git
> commands" idea, splitting it into two possible ways of working in this
> project: the more technical 'Generation number v2', and 'Interval labels
> for commit graph' which is more of research project?  Which should be
> put first, then?

I would suggest working on generation number v2 first because:
- We ship improved performance *sooner*.
- I find it easier to shift from simpler to more complex systems.

On a personal note, I would be able to do a better job at working on generation
number v2 than on interval labels. While reading through the papers mentioned
was fun and full of a-ha moments, I find building things more
fulfilling. I would be
glad if either you or Derrick opt for mentoring it.

You could read my proposal at the link below. It is very rough and I haven't
proofread it yet. I will send out a more formal proposal once the direction
of this project is decided.

https://github.com/abhishekkumar2718/GSoC20/blob/master/graph_labelling.md

**Too long, didn't read**

- Commit graphs are small to medium-sized (compared to problem sizes observed in
graph-theory literature) sparse graphs. They have unusual properties compared
to more conventional graphs that can be exploited for better performance.

- Most of git's reachability queries are negative and using a negative-cut
filter improves performance more than a positive-cut filter.

- Implementing and maintaining a two-dimensional reachability index is hard
and does not offer justifiable performance gains.

- We plan to use corrected commit date as the generation number v2 because it is
locally computable, immutable and can be incrementally updated.

- If git ever considers a two-dimensional reachability index, either post order
DFS, GRAIL or an index based on commit date would be good starting
places to explore.

I go in more details about GRAIL, FERRARI and PReaCH, explaining
briefly how they work, their advantages, and disadvantages.

> Note that for example "Convert scripts to builtins" idea is in similar
> situation: it is also many projects in one.

Regards
Abhishek



[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux