This proposal can also be read at https://docs.google.com/document/d/1JBznA5n0WdWsbEskCeXxOnQuaa0urD89VtprxstLPzo/edit?usp=sharing Unify ref-filter formats with other --pretty formats Personal Info ============= Full Name: Venkata Sai Sri Kousik Sanagavarapu E-mail: five231003@xxxxxxxxx Ph. No.: +91 6304308245 Alt. Ph. No.: +91 9704654555 Education: Vasavi College of Engineering, Hyderabad Year: II / IV Semester: IV / XIII Degree: Bachelor of Engineering in Electronics and Communication Engineering Github: https://github.com/five-sh Overview ======== Git has an old problem of duplicated implementations of some logic. For example, Git has at least 4 different implementations to format command output for different commands. The goal of this project is to reduce these duplications and work towards a single implementation to format command output. There is more than one way to do this and there has been work done on this by GSoC students and Outreachy interns before me. The expected project size is 175 hours or 350 hours and the difficulty level is medium. Pre GSoC ======== I first got into Git’s source code around October, 2022 and have been going through code of topics that I found interesting whenever I had some time away from my college work. The following are the patches that I submitted, from earliest to the latest: [PATCH] repository-version.txt: partialClone casing change Status: merged into master Commit: 29c550f0a Merge Commit: 859899ddc (branch: ks/partialclone-casing) Description: This was my first patch to Git. I had found that the configuration variable extensions.partialClone had a typo in the way it was documented, while reading the documentation surrounding partial clones. Now that I look at it again, it seems that the patch was kind of noisy because the config variable would have still worked with no emphasis on the case but I guess it’s good to have everything going in one pattern, for the sake of documentation. Mailing list: https://lore.kernel.org/git/20221110160556.29557-1-five231003@xxxxxxxxx/ [RFC][PATCH] object.c: use has_object() instead of repo_has_object_file() Status: Peff and others took off from here Description: This again was kind of a search-and-replace type of patch. I wasn’t really sure of the code and made this change as a result of the comment surrounding repo_has_object_file() which says that this and related functions are deprecated (hence the RFC). Peff reviewed the patch and explained about this function and the use of it in the particular case where I made the change, which was really helpful and added to my knowledge. Peff also realized that there were changes to be done to the logic of parse_object() (the function in which I made the change) and submitted patches, which were in turn reviewed by Ævar and he submitted changes in response to that. I now think that I should have replied to the review and taken part in the discussion, leading to me learning something more, but I was so overwhelmed that I didn’t do it. I corrected this in my later patches. Mailing list: https://lore.kernel.org/git/20221116163956.1039137-1-five231003@xxxxxxxxx/ [PATCH] merge: use reverse_commit_list() for list reversal (Microproject) Status: Discontinued Description: This was a change I did to address the issue #1156 on gitgitgadget. This was however not a correct change logic wise because the reverse_commit_list() function modifies the list in-place (that is, uses the elements of the original list to make the reversed list) such a modification could break merge if we had multiple merge strategies. Mailing list: https://lore.kernel.org/git/20230202165137.118741-1-five231003@xxxxxxxxx/ [PATCH] commit: warn the usage of reverse_commit_list() helper Status: Discontinued Description: This change was made based on the preceding patch but according to the review it seems that such an addition to the comment was unnecessary as the original comment was clear enough. Mailing list: https://lore.kernel.org/git/20230207150359.177641-1-five231003@xxxxxxxxx/ [PATCH v4] index-pack: remove fetch_if_missing=0 Status: Discontinued Description: This change strove to remove the use of fetch_if_missing in index-pack by replacing has_object_file() with has_object() which does not lazy-fetch when an object is missing in a partial clone. A test was also added to make sure that this change did not lazy-fetch. This patch was discontinued because it was decided as a result of discussion that it would be better to check all the cases where fetch_if_missing is set to 1 and make changes there so that we either fetch efficiently or not fetch at all. By doing this, in the final world-view, we can remove fetch_if_missing from index-pack as it would be set to zero everywhere. Mailing list: https://lore.kernel.org/git/20230317175601.4250-1-five231003@xxxxxxxxx/ Proposed Project ================ Goal ==== The goal of this project is, as the title says, unifying ref-filter formats with other pretty formats. It would be great to have a single interface, which took care of all the formatting and not have different logic to implement different formatting options. Quoting from the mailing list discussion https://lore.kernel.org/git/CAL21BmnU2aTT_8iqejurgKeHXk-kmmGK1tmXLcVh7G12rwRPOw@xxxxxxxxxxxxxx/ “For example, 'short' in pretty means 'commit %(objectname)%0aAuthor: %(author)' in ref-filter” Previous Work ============= There has been much work done in the past in this area. It majorly comes from previous Outreachy interns and GSoC students. Olga Telezhnaia <olyatelezhnaya@xxxxxxxxx> did work in this area in the fields of `cat-file` and `ref-filter` as a part of her Outreachy Internship titled “Unifying Git’s format languages”. This work and also the work done after that helped take ref-filter to a more general setting. She blogged about her work here https://medium.com/@olyatelezhnaya Hariom Verma <hariom18599@xxxxxxxxx> did work in this area as his GSoC project titled “Unify ref-filter formats with other --pretty formats”. This is the major work done in this area and the final report can be read at https://harry-hov.github.io/blogs/posts/the-final-report This work is very useful as this serves as a kind of documentation and starting point to work towards the goal. ZheNing Hu <adlternative@xxxxxxxxx> has done major work under his GSoC project titled “Use ref-filter formats in git cat-file” in the area of git cat-file, but more relevant to this project are the changes done to ref-filter. This work was a continuation of Olga’s work and made some changes to ref-filter logic. His final report can be read here https://github.com/adlternative/adlternative.github.io/blob/gh-pages/blogs/gsoc/GSOC-Git-Final-Blog.md Nsengiyumva Wilberforce <nsengiyumvawilberforce@xxxxxxxxx> did work in this area as a part of his Outreachy Internship titled “Unify ref-filter formats with other --pretty formats”. He got rid of the duplicate implementation of the `signature` atom logic. This work can be read here https://lore.kernel.org/git/20230311210607.64927-1-nsengiyumvawilberforce@xxxxxxxxx/ Difficulties ============ A major difficulty is backward compatibility, so any changes made to remove the duplicated logic would need to be done so very carefully. Any new tests added must also be very precise so as to efficiently test the changes that are made. There are also minor difficulties, such as the older tests failing because of the changes made, so the work will have to be in such a way that those tests are successful and the duplicated logic is refactored. The Plan ======== I think Hariom’s final report of his GSoC project is a good starting point for working on the project. The report lists the work which is left in the “WHATS LEFT?” section, so I think the first issue to work on would be to look into why “Around 30% of the log tests are failing” and to work in the area of mbox/email formatting for commits. Work can also be done to make pretty handle unknown formatting options. >From here, I can work on the remaining portion of the formats and can remove the duplicated logic wherever possible, also writing tests to ensure that everything works. I can take the approach similar to what Hariom did before this. Estimated Timeline ================== Misc April 5 to May 3 - Continue to work on git and get more familiar with the code. - Find and fix stuff. - Work on stuff that interests me. Community Bonding May 4 to May 28 - Get myself familiar with the code of ref-filter.{c, h} and pretty.{c, h}. - Communicate with my mentors about the approaches that can be taken to get to the goal. - Working on Hariom’s branches (mentioned in his final report) and making changes on top of them. Coding Phase I May 29 to July 14 - Convert formatting options to reuse ref-filter formatting logic. - Update existing tests and add new tests. - Update documentation. Coding Phase II July 14 to August 21 - Further convert formatting options to reuse ref-filter formatting logic and teach pretty to handle them. - Update existing tests and add new tests. - Update documentation. Final Coding Phase August 21 to August 28 - Wrap up and fix bugs (if any). - Update about the remaining stuff (if any). - Make a final report outlining future work. Blogging about Git ================== I think blogging is one of the important parts of any project. It helps other people understand what one is doing and helps the person get to a better understanding of their work. I will blog about the project every week, the blogs can be read at https://five-sh.github.io/ Availability ============ I will be having my semester mostly throughout the summer and so will be able to work 35-40 hours per week. I will always be able to dedicate more time towards the project on the weekends. I will be in contact through my email and my phone. I am also open to calls and online meets. Post GSoC ========= I love being a part of the Git community. The whole process of getting to work on git’s code, submitting patches and getting reviews is a new and great experience for me. I plan to continue in the community after GSoC too and will continue contributing to git and will continue learning from all of you. I am also open to co-mentoring or mentoring if ever given the chance. I also am very interested in partial clones and I hope to work in that area. Closing (optional) ================== Ever since I first got into git’s code and its community back in 2022, it has evolved into a very unique and great experience for me. I have learned so much in the past few months and will continue to do so from all of you here at git. Hariom's proposal has been a great resource in writing this proposal. Thanks & Regards, Kousik Sanagavarapu