Re: [RFC] Possible idea for GSoC 2020

Abhishek Kumar <abhishekkumar8222@xxxxxxxxx> · Tue, 17 Mar 2020 22:30:00 +0530

> Having such a complicated two-dimensional system would need to
> justify itself by being measurably faster than that one-dimensional
> system in these example commands.
>
> [...]
>
> My _prediction_ is that the two-dimensional system will be more
> complicated to write and use, and will not have any measurable
> difference. I'd be happy to be wrong, but I also would not send
> anyone down this direction only to find out I'm right and that
> effort was wasted.

Agreed. I have been through the papers of the involved variants and on graphs
comparable to some of the largest git repositories, the performance improves by
fifty nanoseconds for a random query.

Additionally:
1. They require significantly more space per commit.
2. They require significantly more preprocessing time.

> My recommendation is that a GSoC student update the
> generation number to "v2" based on the definition you made in [1].
> That proposal is also more likely to be effective in Git because
> it makes use of extra heuristic information (commit date) to
> assist the types of algorithms we care about.

> In that case, the "difficult" part is moving the "generation"
> member of struct commit into a slab before making it a 64-bit
> value. (This is likely necessary for your plan, anyway.) Updating
> the generation number to v2 is relatively straight-forward after
> that, as someone can follow all places that reference or compute
> generation numbers and apply a diff

Thanks for the recommendation. Reading about how this fits in more
with REU on the other thread, I too agree that updating generation
number to use corrected commit date would be more appropriate for a GSoC
project.

Regards
Abhishek