Hey Everyone, I would love to participate in outreachy this year with Git in the project "Accelerate rename detection and the range-diff command in Git". I have contributed to the microproject "Unify the meaning of dirty between diff and describe"[1] which is still under review, but through the process, I have got myself familiar with the mailing list and patch review system. I am also contributing to another issue[2] which is still under discussion[3] about `git bisect` and `git rebase`. [1] https://lore.kernel.org/git/pull.751.git.1602781723670.gitgitgadget@xxxxxxxxx [2] https://github.com/gitgitgadget/git/issues/486 [3] https://lore.kernel.org/git/pull.765.git.1603271344522.gitgitgadget@xxxxxxxxx/ Coming to the project, I have read more about it[4] and have created the initial version for the timeline. I would really love to have comments on it. [4] https://github.com/gitgitgadget/git/issues/519 Also, there's a column for community-specific questions in the final application. Is there anything specific that I have to fill in that? Please let me know if I missed anything. Looking forward to working and learning with you all. Thanks and Regards, Sangeeta ================================================= Link to docs: https://docs.google.com/document/d/15mgqy4id1fXZWE1NvBEERWvET9zy-ZEfhp4x0NNv_d4/edit?usp=sharing ================================================= ## Accelerate rename detection and the range-diff command in Git # Timeline ## Nov 23 - Dec 1(Before intern officially starts) * Getting to know the mentors. * Bonding with the community. * Understanding the structure of the code and familiarizing myself with the requirements during the internship period. * Create a concrete workflow for outreachy tasks. ## Dec 1 - Dec 20 * Study about various Approximate Nearest Neighbor Search algorithms. * There are various comparisons for the Approximate Nearest Neighbor algorithm like: * [ANN benchmarks](http://ann-benchmarks.com/) * [How to benchmark ANN algorithms](https://medium.com/gsi-technology/how-to-benchmark-ann-algorithms-a9f1cef6be08) * Would compare all the algorithms and would narrow down to one or two best algorithms for our use case. ## Dec 11: Initial point of feedback * Would take feedback from the mentors and would ask about all the expectations that mentors and the community have from me. ## Dec 21 - Jan 05 * Would study how Locality Sensitive Hashing (data-independent) or Locality Preserving Hashing (data-dependent) can improve our accuracy (or even complexity). * Would study various hashing algorithms and combine them with our nearest neighbor search algorithm. ## Jan 06 - Jan 20 * Study if a pre-trained Support Vector Machine can add something to our use case. * Study how different organizations(eg Gerrit) decide if two commits are similar or not. * SVM’s have accuracy disadvantage as compared to nearest neighbor algorithms. Therefore, I would look into ways if we can create a hybrid algorithm which uses SVM’s and nearest neighbor algorithms and get better accuracy. There are also some research papers on the same. I would study that and would finalize the algorithm after discussion with mentors and the community. ## Jan 12: Midpoint feedback * Would take feedback from the mentors and would ask about ways where I can improve or places where I was lagging. ## Jan 21 - Feb 15 * Implement the finalized algorithm. * Benchmark its accuracy and complexity against existing methods. * Use it for the rename detection and for commit matching in `git range-diff`. * Update the documentation for the same. ## Feb 16 - Mar 02 ( Wrap up) * Buffer period for incomplete work. * Wrap up the code. * Implement the reviews and suggestions given by mentors. * Write documentation for the code if required. * Get my patches merged. ## Mar 02: Final feedback * Would take the final feedback from the mentors and would ask about ways where I could have improved on. * Would talk about ways to connect even after the Outreachy period. ## Post-Outreachy * I intend to keep contributing even after the Outreachy period ends. * Would love to co-mentor(if possible) in the next outreachy and GSoC rounds. * Would love to review patches of other contributors and take part in the mailing list discussions. # Other Involvements * Blogging is an important part of Outreachy, therefore I would love to write a blog every weekend or every fortnight, as discussed with mentors, writing in it the summary of work done so far, anything I learned in that week, and my experience. * I would also be glad to help other contributors and users solve their issues and help the maintainers in reviewing patches over the outreachy period and even after that.