This is the V2 of the draft proposal after applying the
changes suggested by Christian Couder and Kaartik Sivaraam.
Thank you for you reviews :).
==================================================
Convert submodule to builtin
March 2021
==================================================
##Personal Information##
Name - Chinmoy Chakraborty
E-mail - chinmoy12c@xxxxxxxxx
Github - https://github.com/chinmoy12c
Linkedin - https://www.linkedin.com/in/chinmoy12c/
Major - Information Technology
Time Zone - IST (UTC+05:30)
=================================================
##Work Environment##
I am fluent in C, Java, Python, and Shell script. I use Git
as my VCS, Visual Studio Code as my primary code editor, and
Kali Linux as my primary OS.
=================================================
##Git Contributions##
[Microproject] Replace instances of `the_repository` with ‘r’. (Learning
the ropes)
Pull request: https://github.com/gitgitgadget/git/pull/915
Mailing List:
https://lore.kernel.org/git/pull.915.git.1616701733901.gitgitgadget@xxxxxxxxx/
[column, range-diff] downcase option description
Pull request: https://github.com/gitgitgadget/git/pull/920
Mailing List:
https://lore.kernel.org/git/pull.920.git.1616913298624.gitgitgadget@xxxxxxxxx/
[Documentation] updated documentation for git commit --date
Pull request: https://github.com/gitgitgadget/git/pull/918
Mailing List:
https://lore.kernel.org/git/pull.918.git.1616926790227.gitgitgadget@xxxxxxxxx/
=================================================
##Project Outline##
A few components of git, like `git-submodule.sh`
are in the form of shell scripts. This causes
problems in production code in multiple platforms
like windows. The goal of this project is to
convert the shell script version of `git-submodule.sh`
to portable c code. The end goal would be
to completely remove `git-submodule.sh` and rename
`builtin/submodule--helper.c` to `builtin/submodule.c`
so that the `git submodule` is fully implemented using C.
=================================================
##Why is the project required?##
"Issues with the portability of code"
The submodule shell script uses shell commands like
`echo`, `grep`, `test`, `printf` etc. When switching
to non-POSIX compliant systems, one will have
to re-implement these commands specifically for the
system. There are also POSIX-to-Windows path conversion
issues. To fix these issues, it was decided to convert
these scripts into portable C code.
"Large overhead in calling the command"
The commands implemented in shell scripts are not builtins, so
they call `fork()` and `exec()` multiple times, hence creating
additional shells. This adds to the overhead in using the
commands in terms of time and memory.
"No access to low-level API"
The shell commands don’t have access to low level commands
like `git hash-object`, `git cat-file` etc. As these commands
are internally required for submodule commands to work, the shell
script needs to spawn a separate shell to execute these commands.
=================================================
##How have I prepared?##
I have gone through all the previous works and read through their
code to make myself accustomed to the intricacies of the code.
I have also structured my workflow based on the observation of
the previous discussions on those patches, and taken into
consideration the issues faced previously.
=================================================
##Previous Work##
A large part of the `git submodule--helper.c` has already been
converted by Stefan Beller, Prathamesh Chavan in his GSoC project
in 2017, and Shourya Shukla in his GSoC project in 2020. This is
the list of already ported commands.
set-branch
set-url
summary
status
init
deinit
update
foreach
sync
absorbgitdirs
=================================================
##Work to be done##
The only command that is left to be ported is `git submodule add`.
The previous work on this by Shourya Shukla in GSoC 2020, did
not reach a successful merge due to some issues in design and
the patch was dropped as it had been stale for long.
See:
https://github.com/git/git/blob/1861aa482a38ae84f597cede48167ab43e7e50a3/whats-cooking.txt#L1158-L1169
The first and foremost aim of the project will be to finish
porting the `add` command. Thereafter, the end goal would be to
completely replace the shell script (git-submodule.sh) with
an efficient c code.
Before porting the `git submodule add` command the initial work
would be dedicated to the implementation of small helper functions
in the form of small patches, which would be directly used by the
`add` command. This workflow is based on the suggestion by
Junio C Hamano on the thread:
https://lore.kernel.org/git/xmqqd01sugrg.fsf@xxxxxxxxxxxxxxxxxxxxxx/.
This workflow would help in the following ways:
- It would help in sending patches in a small digestible format.
- It would help the reviewers easily review those small units
of patches in a single sitting.
- It would help keep small logical units of code in different clean commits.
An additional test tweak would also be required in
`t7400-submodule-basic.sh`,
to prepend the keyword ‘fatal’ since the command dies out in case
of absence of commits as pointed out by Kaartic Sivaraam on the thread:
https://lore.kernel.org/git/ce151a1408291bb0991ce89459e36ee13ccdfa52.camel@xxxxxxxxx/.
The following helper functions would be required to be implemented -
- A function to guess the path name from the repository string.
Example prototype: static char *guess_dir_name(const char *repo)
Returns the path name.
- A function to assure repo is absolute or relative to parent.
Example prototype: static char *get_real_repo(const char *repo)
Returns the correct repo name.
- A function for normalizing path, that is, removing multiple
//; leading ./; /./; /../; trailing / .
Example prototype: static char *normalize_path(const char *path)
- A function to check if the path exists and is already a git
repo, else clone it.
Example prototype: static int add_submodule(struct add_data *info)
`add_data` is a struct which stores the config of the submodule.
- A function to set the submodule config properly.
Example prototype: static void config_added_submodule(struct add_data
*info)
- After implementation of all these helper methods, the main
`module_add()` function would be implemented using the helper
functions listed above as well as those helper functions which
are predefined.
=================================================
##Project Timeline##
"Present Day - May 17"
I’ll utilize this time in exploring the codebase more properly and
solving more issues, which would help me properly familiarize
myself with the codebase. I’ll also try to structure a more
solidified, detailed workflow and come up with a draft patch
based on the previous work and discussions.
"May 17 - June 7 (Community bonding period)"
- Get familiar with the community.
- Discuss proper workflow with mentors.
- Make changes in the timeline, if necessary.
- Discuss the structure of the series of patches.
"June 7 - June 25 (Initial coding phase)"
- Finish implementation of the helper functions.
- Work on a proper structure of the implementation of the
`submodule add` command and implement additional helper
functions if required.
- Update proper documentation of the helper functions.
"June 25 - July 5 (Binding the code)"
This time would be used to code the main `submodule add`
command using all the helper functions implemented in the
initial phase of coding. This includes binding all the code
together and then completing the command through incremental
reviews. Also, the necessary documentation would be updated
parallelly.
"July 5 - July 12 (Initiate porting of command)"
- Discuss how to go about porting the entire submodule script.
- Initiate porting of the `git-submodule.sh` script.
"July 12 - July 16 (Phase 1 evaluation)"
"July 16 - July 26 (Semester exams)"
I will be taking my semester examinations during this
time. As such, I’ll try to be in touch with the mentors
and take out as much time as possible (around 20 hours a week).
"July 26 - August 10 (Porting the complete script)"
This period would be utilized in the complete conversion of
`git-submodule.sh` into c code and combine it with
`submodule--helper.c` to make a single `builtin/submodule.c`.
As I’ll be completely free from academics during this period,
I’ll try to compensate as much time as possible for the above
period of July 16 - July 26.
"August 10 - August 16 (Final review and evaluation)"
- Final review by the mentors.
- Apply necessary changes and touch-ups.
- Updating documentation, if any left.
"August 16 - August 23 (Submission of final report)"
Additionally: There are places in the original shell script
and c code tagged as `NEEDSWORK`. My aim would be to resolve
these issues within the GSoC period if time permits.
=================================================
##Post GSoC##
After the GSoC period, I plan to continue my contributions
for git and look for other issues to work on. I’d look into
the conversion of other commands which are pending conversion,
as well as work on the `NEEDSWORK` part of the code (If I’m
unable to finish it within the GSoC period itself). I plan on
mentoring new contributors to git and help the contributors
by doing code reviews and solving their doubts and helping
them out.
Regards,
Chinmoy Chakraborty.