Re: [PATCH 0/6] Compute and consume generation numbers

Derrick Stolee <stolee@xxxxxxxxx> · Sat, 7 Apr 2018 21:06:24 -0400

On 4/7/2018 12:55 PM, Jakub Narebski wrote:
Currently I am at the stage of reproducing results in FELINE paper:
"Reachability Queries in Very Large Graphs: A Fast Refined Online Search
Approach" by Renê R. Veloso, Loïc Cerf, Wagner Meira Jr and Mohammed
J. Zaki (2014).  This paper is available in the PDF form at
https://openproceedings.org/EDBT/2014/paper_166.pdf

The Jupyter Notebook (which runs on Google cloud, but can be also run
locally) uses Python kernel, NetworkX librabry for graph manipulation,
and matplotlib (via NetworkX) for display.

Available at:
https://colab.research.google.com/drive/1V-U7_slu5Z3s5iEEMFKhLXtaxSu5xyzg
https://drive.google.com/file/d/1V-U7_slu5Z3s5iEEMFKhLXtaxSu5xyzg/view?usp=sharing

I hope that could be of help, or at least interesting

Let me know when you can give numbers (either raw performance or # of 
commits walked) for real-world Git commit graphs. The Linux repo is a 
good example to use for benchmarking, but I also use the Kotlin repo 
sometimes as it has over a million objects and over 250K commits.

Of course, the only important statistic at the end of the day is the 
end-to-end time of a 'git ...' command. Your investigations should 
inform whether it is worth prototyping the feature in the git codebase.

Thanks,

-Stolee