Re: [RFC] Possible idea for GSoC 2020

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Christian Couder <christian.couder@xxxxxxxxx> writes:
> On Tue, Mar 17, 2020 at 3:18 PM Jakub Narebski <jnareb@xxxxxxxxx> wrote:
>> Christian Couder <christian.couder@xxxxxxxxx> writes:
>>> On Tue, Mar 17, 2020 at 4:13 AM Jakub Narebski <jnareb@xxxxxxxxx> wrote:
[...]
>>>> As I wrote below, such positive-cut filter would be directly helpful in
>>>> performing the following commands:
>>>>
>>>>  - `git merge-base --is-ancestor`
>>>>  - `git branch --contains`
>>>>  - `git tag --contains`
>>>>  - `git branch --merged`
>>>>  - `git tag --merged`
>>>>
>>>> It would be also useful for tag autofollow in git-fetch; is is N-to-M
>>>> equivalent to 1-to-N / N-to-1 `--contains` queries.
>>>>
>>>> I am quite sure that positive-cut filter would make `--ancestry-path`
>>>> walk faster.
>>>>
>>>> I think, but I am not sure, that positive-cut filter can make parts of
>>>> topological sort and merge base algorithms at least a tiny bit faster.
>>>
>>> Is there an easy way to check that it would provide significant
>>> performance improvements at least in some cases? Can we ask the
>>> student to do that at the beginning of the GSoC?
>>
>> The "Reachability labels for version control graphs.ipynb" Jupyter
>> Notebook on Google Colaboratory was created to answer this question
>> (originally for the FELINE reachability index).  Among others it can
>> min-post interval labels and topological levels (generation numbers),
>> use them for reachability queries, and load Linux kernel
>> commit-graph. The exploration didn't get finished, but it would be not
>> that difficult, I think, to at least find the amount of false negatives
>> for min-post interval labeling for git.git or Linux kernel repo.
>>
>>   https://colab.research.google.com/drive/1V-U7_slu5Z3s5iEEMFKhLXtaxSu5xyzg
>>
>> As Jupyter Notebook, it is run in the web browser.  It can either use
>> local runtime, or run on Google Cloud runtime.  On the other hand it
>> requires at least some knowledge of Python...
>
> Ok, if that's a possible approach to the project, then please add it
> to the description.

Added as a new paragraph in the 'interval labels' subtask description
https://github.com/git/git.github.io/commit/ac526b18b97e53a431767ce147f27eaf6af0aa76

[...]
>>>>> My recommendation is that a GSoC student update the
>>>>> generation number to "v2" based on the definition you made in [1].
>>>>> That proposal is also more likely to be effective in Git because
>>>>> it makes use of extra heuristic information (commit date) to
>>>>> assist the types of algorithms we care about.
>>>>>
>>>>> In that case, the "difficult" part is moving the "generation"
>>>>> member of struct commit into a slab before making it a 64-bit
>>>>> value. (This is likely necessary for your plan, anyway.) Updating
>>>>> the generation number to v2 is relatively straight-forward after
>>>>> that, as someone can follow all places that reference or compute
>>>>> generation numbers and apply a diff
>>>>
>>>> Good idea!  Though I am not sure if it is not too late to add it to the
>>>> https://git.github.io/SoC-2020-Ideas/ as the self imposed deadline of
>>>> March 16 (where students can start submitting proposals to GSoC) has
>>>> just passed.  Christian, what do you think?
>>>
>>> Would that be a different project idea or part of your "Graph labeling
>>> for speeding up git commands" project idea?
>>>
>>> I am very reluctant to add new project ideas at that time. I don't
>>> think student will have time to properly research it and get it
>>> reviewed.
[...]

>> So perhaps we should expand "Commit graph labeling for speeding up git
>> commands" idea, splitting it into two possible ways of working in this
>> project: the more technical 'Generation number v2', and 'Interval labels
>> for commit graph' which is more of research project?
>
> Ok for that.
>
>>  Which should be
>> put first, then?
>
> Which ever you prefer. If you say that any part could be done
> separately, that doesn't matter much.

I have added 'Generation number v2' as one of alternative ways of
working on the more generic "Commit graph labeling for speeding up git
commands" idea -- as first task, because it fit better the narrative:
https://github.com/git/git.github.io/commit/a6d59820709417081c334a5120342987ff046f1a

Could you (or Stolee) check current proposal, so that it can be merged
in?  Thanks in advance.

  https://github.com/git/git.github.io/blob/soc-2020-idea-jnareb/SoC-2020-Ideas.md

Best,
-- 
Jakub Narębski




[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux