On 4/7/2018 12:55 PM, Jakub Narebski wrote:
Currently I am at the stage of reproducing results in FELINE paper: "Reachability Queries in Very Large Graphs: A Fast Refined Online Search Approach" by Renê R. Veloso, Loïc Cerf, Wagner Meira Jr and Mohammed J. Zaki (2014). This paper is available in the PDF form at https://openproceedings.org/EDBT/2014/paper_166.pdf The Jupyter Notebook (which runs on Google cloud, but can be also run locally) uses Python kernel, NetworkX librabry for graph manipulation, and matplotlib (via NetworkX) for display. Available at: https://colab.research.google.com/drive/1V-U7_slu5Z3s5iEEMFKhLXtaxSu5xyzg https://drive.google.com/file/d/1V-U7_slu5Z3s5iEEMFKhLXtaxSu5xyzg/view?usp=sharing I hope that could be of help, or at least interesting
Let me know when you can give numbers (either raw performance or # of commits walked) for real-world Git commit graphs. The Linux repo is a good example to use for benchmarking, but I also use the Kotlin repo sometimes as it has over a million objects and over 250K commits.
Of course, the only important statistic at the end of the day is the end-to-end time of a 'git ...' command. Your investigations should inform whether it is worth prototyping the feature in the git codebase.
Thanks, -Stolee