Christian Couder <christian.couder@xxxxxxxxx> writes: > On Tue, Mar 17, 2020 at 3:18 PM Jakub Narebski <jnareb@xxxxxxxxx> wrote: >> Christian Couder <christian.couder@xxxxxxxxx> writes: >>> On Tue, Mar 17, 2020 at 4:13 AM Jakub Narebski <jnareb@xxxxxxxxx> wrote: [...] >>>> As I wrote below, such positive-cut filter would be directly helpful in >>>> performing the following commands: >>>> >>>> - `git merge-base --is-ancestor` >>>> - `git branch --contains` >>>> - `git tag --contains` >>>> - `git branch --merged` >>>> - `git tag --merged` >>>> >>>> It would be also useful for tag autofollow in git-fetch; is is N-to-M >>>> equivalent to 1-to-N / N-to-1 `--contains` queries. >>>> >>>> I am quite sure that positive-cut filter would make `--ancestry-path` >>>> walk faster. >>>> >>>> I think, but I am not sure, that positive-cut filter can make parts of >>>> topological sort and merge base algorithms at least a tiny bit faster. >>> >>> Is there an easy way to check that it would provide significant >>> performance improvements at least in some cases? Can we ask the >>> student to do that at the beginning of the GSoC? >> >> The "Reachability labels for version control graphs.ipynb" Jupyter >> Notebook on Google Colaboratory was created to answer this question >> (originally for the FELINE reachability index). Among others it can >> min-post interval labels and topological levels (generation numbers), >> use them for reachability queries, and load Linux kernel >> commit-graph. The exploration didn't get finished, but it would be not >> that difficult, I think, to at least find the amount of false negatives >> for min-post interval labeling for git.git or Linux kernel repo. >> >> https://colab.research.google.com/drive/1V-U7_slu5Z3s5iEEMFKhLXtaxSu5xyzg >> >> As Jupyter Notebook, it is run in the web browser. It can either use >> local runtime, or run on Google Cloud runtime. On the other hand it >> requires at least some knowledge of Python... > > Ok, if that's a possible approach to the project, then please add it > to the description. Added as a new paragraph in the 'interval labels' subtask description https://github.com/git/git.github.io/commit/ac526b18b97e53a431767ce147f27eaf6af0aa76 [...] >>>>> My recommendation is that a GSoC student update the >>>>> generation number to "v2" based on the definition you made in [1]. >>>>> That proposal is also more likely to be effective in Git because >>>>> it makes use of extra heuristic information (commit date) to >>>>> assist the types of algorithms we care about. >>>>> >>>>> In that case, the "difficult" part is moving the "generation" >>>>> member of struct commit into a slab before making it a 64-bit >>>>> value. (This is likely necessary for your plan, anyway.) Updating >>>>> the generation number to v2 is relatively straight-forward after >>>>> that, as someone can follow all places that reference or compute >>>>> generation numbers and apply a diff >>>> >>>> Good idea! Though I am not sure if it is not too late to add it to the >>>> https://git.github.io/SoC-2020-Ideas/ as the self imposed deadline of >>>> March 16 (where students can start submitting proposals to GSoC) has >>>> just passed. Christian, what do you think? >>> >>> Would that be a different project idea or part of your "Graph labeling >>> for speeding up git commands" project idea? >>> >>> I am very reluctant to add new project ideas at that time. I don't >>> think student will have time to properly research it and get it >>> reviewed. [...] >> So perhaps we should expand "Commit graph labeling for speeding up git >> commands" idea, splitting it into two possible ways of working in this >> project: the more technical 'Generation number v2', and 'Interval labels >> for commit graph' which is more of research project? > > Ok for that. > >> Which should be >> put first, then? > > Which ever you prefer. If you say that any part could be done > separately, that doesn't matter much. I have added 'Generation number v2' as one of alternative ways of working on the more generic "Commit graph labeling for speeding up git commands" idea -- as first task, because it fit better the narrative: https://github.com/git/git.github.io/commit/a6d59820709417081c334a5120342987ff046f1a Could you (or Stolee) check current proposal, so that it can be merged in? Thanks in advance. https://github.com/git/git.github.io/blob/soc-2020-idea-jnareb/SoC-2020-Ideas.md Best, -- Jakub Narębski