[GSoC][Proposal] Unify ref-filter formats with other --pretty formats

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



This proposal can also be read at
https://docs.google.com/document/d/1JBznA5n0WdWsbEskCeXxOnQuaa0urD89VtprxstLPzo/edit?usp=sharing

Unify ref-filter formats with other --pretty formats

Personal Info
=============
Full Name: Venkata Sai Sri Kousik Sanagavarapu
E-mail: five231003@xxxxxxxxx
Ph. No.: +91 6304308245
Alt. Ph. No.: +91 9704654555

Education: Vasavi College of Engineering, Hyderabad
Year: II / IV
Semester: IV / XIII
Degree: Bachelor of Engineering in
	    Electronics and Communication Engineering

Github: https://github.com/five-sh

Overview
========
Git has an old problem of duplicated implementations of some logic.
For example, Git has at least 4 different implementations to format
command output for different commands.

The goal of this project is to reduce these duplications and work
towards a single implementation to format command output. There is
more than one way to do this and there has been work done on this
by GSoC students and Outreachy interns before me.

The expected project size is 175 hours or 350 hours and the difficulty
level is medium.

Pre GSoC
========
I first got into Git’s source code around October, 2022 and have been
going through code of topics that I found interesting whenever I had
some time away from my college work. The following are the patches that
I submitted, from earliest to the latest:

[PATCH] repository-version.txt: partialClone casing change
Status: merged into master
Commit: 29c550f0a
Merge Commit: 859899ddc (branch: ks/partialclone-casing)
Description:
This was my first patch to Git. I had found that the configuration
variable extensions.partialClone had a typo in the way it was documented,
while reading the documentation surrounding partial clones. Now that I
look at it again, it seems that the patch was kind of noisy because the
config variable would have still worked with no emphasis on the case but
I guess it’s good to have everything going in one pattern, for the sake
of documentation.

Mailing list:
https://lore.kernel.org/git/20221110160556.29557-1-five231003@xxxxxxxxx/

[RFC][PATCH] object.c: use has_object() instead of
		       repo_has_object_file()
Status: Peff and others took off from here
Description:
This again was kind of a search-and-replace type of patch. I wasn’t
really sure of the code and made this change as a result of the
comment surrounding repo_has_object_file() which says that this and
related functions are deprecated (hence the RFC). Peff reviewed the
patch and explained about this function and the use of it in the
particular case where I made the change, which was really helpful
and added to my knowledge. Peff also realized that there were changes
to be done to the logic of parse_object() (the function in which I
made the change) and submitted patches, which were in turn reviewed
by Ævar and he submitted changes in response to that.

I now think that I should have replied to the review and taken part
in the discussion, leading to me learning something more, but I was
so overwhelmed that I didn’t do it. I corrected this in my later patches.

Mailing list:
https://lore.kernel.org/git/20221116163956.1039137-1-five231003@xxxxxxxxx/

[PATCH] merge: use reverse_commit_list() for list reversal
(Microproject)
Status: Discontinued
Description:
This was a change I did to address the issue #1156 on gitgitgadget.
This was however not a correct change logic wise because the
reverse_commit_list() function modifies the list in-place (that is,
uses the elements of the original list to make the reversed list)
such a modification could break merge if we had multiple merge strategies.

Mailing list:
https://lore.kernel.org/git/20230202165137.118741-1-five231003@xxxxxxxxx/

[PATCH] commit: warn the usage of reverse_commit_list() helper
Status: Discontinued
Description:
This change was made based on the preceding patch but according to the
review it seems that such an addition to the comment was unnecessary
as the original comment was clear enough.

Mailing list:
https://lore.kernel.org/git/20230207150359.177641-1-five231003@xxxxxxxxx/

[PATCH v4] index-pack: remove fetch_if_missing=0
Status: Discontinued
Description:
This change strove to remove the use of fetch_if_missing in index-pack
by replacing has_object_file() with has_object() which does not
lazy-fetch when an object is missing in a partial clone. A test was
also added to make sure that this change did not lazy-fetch.

This patch was discontinued because it was decided as a result of
discussion that it would be better to check all the cases where
fetch_if_missing is set to 1 and make changes there so that we either
fetch efficiently or not fetch at all. By doing this, in the final
world-view, we can remove fetch_if_missing from index-pack as it
would be set to zero everywhere.

Mailing list:
https://lore.kernel.org/git/20230317175601.4250-1-five231003@xxxxxxxxx/

Proposed Project
================
Goal
====
The goal of this project is, as the title says, unifying ref-filter
formats with other pretty formats. It would be great to have a single
interface, which took care of all the formatting and not have different
logic to implement different formatting options. Quoting from the mailing
list discussion

https://lore.kernel.org/git/CAL21BmnU2aTT_8iqejurgKeHXk-kmmGK1tmXLcVh7G12rwRPOw@xxxxxxxxxxxxxx/

“For example, 'short' in pretty means 'commit %(objectname)%0aAuthor: %(author)'
in ref-filter”

Previous Work
=============
There has been much work done in the past in this area. It majorly comes
from previous Outreachy interns and GSoC students.

Olga Telezhnaia <olyatelezhnaya@xxxxxxxxx> did work in this area in the
fields of `cat-file` and `ref-filter` as a part of her Outreachy Internship
titled “Unifying Git’s format languages”. This work and also the work done
after that helped take ref-filter to a more general setting. She blogged
about her work here

https://medium.com/@olyatelezhnaya


Hariom Verma <hariom18599@xxxxxxxxx> did work in this area as his GSoC
project titled “Unify ref-filter formats with other --pretty formats”.
This is the major work done in this area and the final report can be
read at

https://harry-hov.github.io/blogs/posts/the-final-report

This work is very useful as this serves as a kind of documentation
and starting point to work towards the goal.

ZheNing Hu <adlternative@xxxxxxxxx> has done major work under his GSoC
project titled “Use ref-filter formats in git cat-file” in the area of
git cat-file, but more relevant to this project are the changes done to
ref-filter. This work was a continuation of Olga’s work and made some
changes to ref-filter logic. His final report can be read here

https://github.com/adlternative/adlternative.github.io/blob/gh-pages/blogs/gsoc/GSOC-Git-Final-Blog.md

Nsengiyumva Wilberforce <nsengiyumvawilberforce@xxxxxxxxx> did work in
this area as a part of his Outreachy Internship titled “Unify ref-filter
formats with other --pretty formats”. He got rid of the duplicate
implementation of the `signature` atom logic. This work can be read here

https://lore.kernel.org/git/20230311210607.64927-1-nsengiyumvawilberforce@xxxxxxxxx/

Difficulties
============
A major difficulty is backward compatibility, so any changes made to
remove the duplicated logic would need to be done so very carefully.
Any new tests added must also be very precise so as to efficiently
test the changes that are made.

There are also minor difficulties, such as the older tests failing
because of the changes made, so the work will have to be in such a way
that those tests are successful and the duplicated logic is refactored.

The Plan
========
I think Hariom’s final report of his GSoC project is a good starting
point for working on the project. The report lists the work which is
left in the “WHATS LEFT?” section, so I think the first issue to work
on would be to look into why “Around 30% of the log tests are failing”
and to work in the area of mbox/email formatting for commits. Work can
also be done to make pretty handle unknown formatting options.

>From here, I can work on the remaining portion of the formats and can
remove the duplicated logic wherever possible, also writing tests to
ensure that everything works.

I can take the approach similar to what Hariom did before this.

Estimated Timeline
==================

Misc
April 5 to May 3
- Continue to work on git and get more familiar with the code.

- Find and fix stuff.

- Work on stuff that interests me.

Community Bonding
May 4 to May 28
- Get myself familiar with the code of ref-filter.{c, h} and
  pretty.{c, h}.

- Communicate with my mentors about the approaches that can
  be taken to get to the goal.

- Working on Hariom’s branches (mentioned in his final report)
  and making changes on top of them.

Coding Phase I
May 29 to July 14
- Convert formatting options to reuse ref-filter formatting logic.

- Update existing tests and add new tests.

- Update documentation.

Coding Phase II
July 14 to August 21
- Further convert formatting options to reuse ref-filter formatting
  logic and teach pretty to handle them.

- Update existing tests and add new tests.

- Update documentation.

Final Coding Phase
August 21 to August 28
- Wrap up and fix bugs (if any).

- Update about the remaining stuff (if any).

- Make a final report outlining future work.

Blogging about Git
==================
I think blogging is one of the important parts of any project. It
helps other people understand what one is doing and helps the person
get to a better understanding of their work. I will blog about the
project every week, the blogs can be read at

https://five-sh.github.io/

Availability
============
I will be having my semester mostly throughout the summer and so will
be able to work 35-40 hours per week. I will always be able to dedicate
more time towards the project on the weekends.

I will be in contact through my email and my phone.

I am also open to calls and online meets.

Post GSoC
=========
I love being a part of the Git community. The whole process of getting
to work on git’s code, submitting patches and getting reviews is a new
and great experience for me. I plan to continue in the community after
GSoC too and will continue contributing to git and will continue learning
from all of you.

I am also open to co-mentoring or mentoring if ever given the chance.

I also am very interested in partial clones and I hope to work in that
area.

Closing (optional)
==================
Ever since I first got into git’s code and its community back in 2022,
it has evolved into a very unique and great experience for me. I have
learned so much in the past few months and will continue to do so from
all of you here at git.

Hariom's proposal has been a great resource in writing this proposal.

Thanks & Regards,
Kousik Sanagavarapu



[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux