[Outreachy][Proposal] Accelerate rename detection and the range-diff

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hey Everyone,

I would love to participate in outreachy this year with Git in the
project "Accelerate rename detection and the range-diff command in
Git". I have contributed to the microproject "Unify the meaning of
dirty between diff and describe"[1] which is still under review, but
through the process, I have got myself familiar with the mailing list
and patch review system. I am also contributing to another issue[2]
which is still under discussion[3] about `git bisect` and `git
rebase`.

[1] https://lore.kernel.org/git/pull.751.git.1602781723670.gitgitgadget@xxxxxxxxx
[2] https://github.com/gitgitgadget/git/issues/486
[3] https://lore.kernel.org/git/pull.765.git.1603271344522.gitgitgadget@xxxxxxxxx/

Coming to the project, I have read more about it[4] and have created
the initial version for the timeline. I would really love to have
comments on it.

[4] https://github.com/gitgitgadget/git/issues/519

Also, there's a column for community-specific questions in the final
application. Is there anything specific that I have to fill in that?

Please let me know if I missed anything.

Looking forward to working and learning with you all.

Thanks and Regards,
Sangeeta

=================================================

Link to docs: https://docs.google.com/document/d/15mgqy4id1fXZWE1NvBEERWvET9zy-ZEfhp4x0NNv_d4/edit?usp=sharing

=================================================

## Accelerate rename detection and the range-diff command in Git

# Timeline

## Nov 23 - Dec 1(Before intern officially starts)

* Getting to know the mentors.
* Bonding with the community.
* Understanding the structure of the code and familiarizing myself
with the requirements during the internship period.
* Create a concrete workflow for outreachy tasks.


## Dec 1 - Dec 20

* Study about various Approximate Nearest Neighbor Search algorithms.
* There are various comparisons for the Approximate Nearest Neighbor
algorithm like:
* [ANN benchmarks](http://ann-benchmarks.com/)
* [How to benchmark ANN
algorithms](https://medium.com/gsi-technology/how-to-benchmark-ann-algorithms-a9f1cef6be08)

* Would compare all the algorithms and would narrow down to one or two
best algorithms for our use case.

## Dec 11: Initial point of feedback

* Would take feedback from the mentors and would ask about all the
expectations that mentors and the community have from me.

## Dec 21 - Jan 05

* Would study how Locality Sensitive Hashing (data-independent) or
Locality Preserving Hashing (data-dependent) can improve our accuracy
(or even complexity).
* Would study various hashing algorithms and combine them with our
nearest neighbor search algorithm.

## Jan 06 - Jan 20
* Study if a pre-trained Support Vector Machine can add something to
our use case.
* Study how different organizations(eg Gerrit) decide if two commits
are similar or not.
* SVM’s have accuracy disadvantage as compared to nearest neighbor
algorithms. Therefore, I would look into ways if we can create a
hybrid algorithm which uses SVM’s and nearest neighbor algorithms and
get better accuracy. There are also some research papers on the same.
I would study that and would finalize the algorithm after discussion
with mentors and the community.

## Jan 12: Midpoint feedback
* Would take feedback from the mentors and would ask about ways where
I can improve or places where I was lagging.

## Jan 21 - Feb 15
* Implement the finalized algorithm.
* Benchmark its accuracy and complexity against existing methods.
* Use it for the rename detection and for commit matching in `git range-diff`.
* Update the documentation for the same.


## Feb 16 - Mar 02 ( Wrap up)
* Buffer period for incomplete work.
* Wrap up the code.
* Implement the reviews and suggestions given by mentors.
* Write documentation for the code if required.
* Get my patches merged.


## Mar 02: Final feedback
* Would take the final feedback from the mentors and would ask about
ways where I could have improved on.
* Would talk about ways to connect even after the Outreachy period.


## Post-Outreachy
* I intend to keep contributing even after the Outreachy period ends.
* Would love to co-mentor(if possible) in the next outreachy and GSoC rounds.
* Would love to review patches of other contributors and take part in
the mailing list discussions.


# Other Involvements
* Blogging is an important part of Outreachy, therefore I would love
to write a blog every weekend or every fortnight, as discussed with
mentors, writing in it the summary of work done so far, anything I
learned in that week, and my experience.
* I would also be glad to help other contributors and users solve
their issues and help the maintainers in reviewing patches over the
outreachy period and even after that.




[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux