[RFC] [GSoC] Proposal Draft for GSoC 2017 : Incremental Rewrite of git-submodules

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hello everyone,

This is the first draft of my project proposal. I decided to change my project
since the project I initially intended to do and also proposed was slightly
a smaller project for 13 weeks. Also when I came across this project,
and also found out that there are left over bits (git submodule $cmd
$pathspec may want to error out when the $pathspec does not match any
submodules) which also can be included in this conversion process, I was
motivated to change my project and hence made my proposal regarding it.

Also I wish to complete the previously proposed task, probably after finishing
GSoC first.

Here is a Google doc for my proposal,
https://docs.google.com/document/d/1krxVLooWl--75Pot3dazhfygR3wCUUWZWzTXtK1L-xU

Below I have also included the text only version of my proposal, so that
all the discussion related to it would also be there on the public-inbox of git.
It would be great to have your suggestion, so that I can improve it futher.

Thanks,
Prathamesh Chavan
---

Incremental Rewrite of git-submodules
01.04.2017

About Me

Name              Prathamesh Chavan
University        Indian Institute of Technology, Kharagpur
Major             Computer Science and Engineering(Dual Degree)
Email             pc44800@xxxxxxxxx
IRC               pratham_pc
Blog              pratham-pc.github.io
Contact No.       +91-993-235-8333
Time Zone         IST (UTC +5:30)


Background

I am a second-year student from the Department of Computer Science and
Engineering IIT Kharagpur. I’m an active member at Kharagpur open
source society and also of Kharagpur Linux Users Group since my first
year. I got introduced to open source, Linux and git since the
beginning of my first year.
I use to try to complete any task on my laptop without the use of the
cursor and hence eventually got familiar with shell script and in the
same manner, I still find something new every day. I’m also quite
familiar with C language. I always wanted to get involved in the
development process. As Google Summer of Code is a great way to
achieve it, hence I would like to participate.
This is my first attempt to GSoC. Since I use Git on regular basis
and would also continue to use it, I would love to be a part of its
contributors and hence would also be able to contribute to the project
even after GSoC.

The Project

Abstract

To manage the external dependencies, git submodule commands have been
frequently used by the developers in their projects. But as most of the
subcommands of git submodule are still written in shell scripts (All, but
git submodule init), my project intends to convert the subcommands into
C code, thus making them builtins. This will increase Git's portability
and hence the efficiency of working with the git-submodule commands.
The function cmd_init has been ported to its built-in successfully and
similar operation needs to be done to the remaining functions.

Problems encountered while using Shell Scripts

Git submodule subcommands are currently implemented by using shell script
'git-submodule.sh'. There are several reasons why we'll prefer not to
use the shell script:

1. Less Efficient

Since being a part of shell script, the commands of git submodule do not
have access to the low-level internal API of git, which are designed to
carry out the task more efficiently.
Also, since the subcommands in  ‘git-submodule.sh’ do not have direct
access to the native low-level internal API's of git, carrying out even
trivial processes make git be spawned into a separate process with the
help of plumbing functions. Also since everything comes from external
commands even in the shell scripts, everything needs to undergo through
a fork and exec. Whenever fork takes place, the git command requires
another fully functioning POSIX shell. This results in making the git
submodule commands slow and relatively inefficient, and also reduces git's
portability, since it is difficult to port such commands to a non-POSIX
an environment like Windows.

2. Spawning makes the process slower in many cases

Most of the operations in the shell scripts require the scripts to call
the external executables, even for performing trivial tasks. When we
take a look at the file 'git-submodule.sh', then its is seen that
many git-executables like : 'git config', 'git checkout' 'git add',
'git rev-list', 'git fetch', 'git rev-parse', 'git describe', 'git
hash-object', 'git log' and since these and other shell command comes
from other executables, every command undergoes a through a fork and
exec, thus making the processes slow.

3. Additional dependencies are introduced

When we take a look at the 'git-submodule.sh' file, we can notice that
it is dependent upon various other executables like: 'test', 'printf',
'echo', 'sed', 'grep', 'mkdir', 'rm'. This leads to the requirement of
adding these various dependencies to a non-POSIX environment.

4. Coverage checking is not implemented for shell scripts:

Since currently kcov is not implemented, checking the coverage of the shell
scripts is still an issue. It was also proposed in previous GSoC proposals, but
wasn’t implemented.  I also checked out its implementation for linux kernel
project , but it seems quite difficult to cover this in my GSoC proposal.
Instead, I would run ‘make coverage’ after I port all functions to C. This will
also help me find the code for which the test suites aren’t created and
improve the test coverage over submodule code.

Brief introduction of git-submodules

Git submodules are used for including a git repository inside a
subdirectory of another git repository. Submodules have their own
history, and the it doesn't interfere with the superproject's history.
Hence it helps the developers include external dependencies like third
party libraries in their project source tree. Submodules have been there
since quite a while, yet there are still areas to work on.
Currently the git submodule is implemented using various file which are:
'git-submodule.sh', 'submodule.c', 'submodule.h' and
'builtin/sumodule--helper.c'. Hence to convert the script file to
buitin, I would first port and implement all the algorithmic and
non-algorithmic part in 'submodule.c' and in 'submodule--helper.c'
respectively and then finally create 'builtin/submodule.c', which
can call any function from submodule--helper.c directly from C itself.
Eventually, I will be ensuring that the user interface part is
implemented in the builtin/submodule.c , having submodule--helper.c
for using it as plumbing command and will use submodule.c
for writing the reusable part for the code.

Goals

1. Convert submodule subcommands to C.
2. Finally creating a builtin/submodule.c

Plan

1. Understand the code written in 'git-submodule.sh'

   In the community bonding period, my main aim will be to understand
   the code is written in the 'git-submodule.sh' and get a clear
   understanding of how each submodule subcommand is being implemented.
   Understanding the subcommand's logical flow is very important since
   I will be retaining it in the rewritten code as well.

2. Understand what API's could be used and learn about these API's and
   also learn about other plumbing functions which will be required.

   In the community bonding period, once I understand the code written
   in 'git-submodule.sh', I would start the discussion of the procedure
   which will be appropriate for the rewriting the code to builtin.
   Since the procedure of rewriting is not simply translating the code
   into another language, but also to utilize the API's and functions
   available in the later language and to create new API to ease the
   future implementations. Here I will learn about the various API's
   I'll be using during the conversion, will clarify my doubts about
   the implementation of the functions and also check for the need of
   creation of any new APIs. Also, as submodule--helper is a plumbing function,
   other such functions will also be used in the conversion and hence is
   learning about them will be useful.

3. Start Converting Submodule subcommands

   As done in the porting of the submodule subcommand 'init', I'll be
   first porting to the submodule--helper, and call the function from
   git-submodule.sh till I convert every subcommand. Then I'll make the
   required changes in the command structure to call submodule on
   place of submodule--helper and also make the similar changes for
   the complete conversion.
   Also while conversion of a single functions to builtin is over,
I’ll immediately
   perform step 4 also, so that while sending out patches, they are well
   tested, as well as the documentation is also updated at the same time.
4. Testing the rewritten code, Optimizing, Documenting and other
changes required:

   The working of rewritten code can be checked with the help of
   available test suites. Also taking into account the bug to be fixed :
   git submodule $cmd $pathspec may want to error out when the
   $pathspec does not match any submodules. [1],
   there will also need to add new test cases in the suite. Also
   the makefile would require changes for updating the build
   process. Once testing is completed, I would start working on
   optimizing various functions written in C so that the code will be
   more efficient. Then finally I would document all the changes. Once
   all of this is done, after sending the patches, the reviews will also
   help in the processes of optimizing the code, as well as improving
   the documentation. Also, I will try to send patches as early as possible
   since, the reviewing of patches takes time, and till then I may start to
   work on next function as well. Also I will try my best to get the code
   merged in time, so as to ensure that I’m on the correct path at any
   point of time.

Potential difficulties during the conversion

1. The rewritten code may introduce bugs, which formerly weren't
   present. Most of these bugs would be detected by the test suites,
   which exists. Apart from them, I would stick to following the same
   the structure of the code, as is written in the script form. And even
   after that, if still there are introduced bugs left, I have allotted
   time for fixing these bugs.
2. As it's witnessed in the porting of 'git submodule init' command,
   that the code written in C language takes more lines than the
   scripted code. Hence, this problem may be addressed by creating
   functions and once we identify the recurring code structures, we
   may also want to want to create a GIT API of them for future
   Convenience.

Timeline(Tentative)

April 1 to April 21

Since I started up with this project late. Hence in this time period,
I would like to convert the submodule subcommand 'foreach' from the
shell script to builtin. Also, in this time span, I would go through
the codebase as much as possible.

April 22 to April 28

End Semester examinations at college. Also I have some small class tests
In between 5th April to 9th April, but other than this, I do not
have any other test in between the described timeline( 1 April to
August 31, 2017).

April 29 to May 15:

We like to spend this time, reading the codebase to understand the
software very well and also get my doubts about implementation clarified
from the mentor.Also, this period will be utilized by me to understand
the various git APIs so that they can be used for the code's implementation.
Also, in this time span, I'll be discussing the way I
have decided to proceed further, and address any issue related to that.
Since my next semester at college starts on July 14th, I will prefer to
start actual coding by 16th May itself, giving sufficient time for
understanding the and for community bonding as well.

May 16 - July 3: (Week 1 - Week 7):

There are around 7 subcommands(excluding for each) of git submodule in
'git-submodule.sh'. I would require on an average 4 days to complete
the conversion of each subcommand. And finally to merge 'submodule.c'
and 'builtin/submodule--helper.c'. Also during this process of code the
conversion itself, I would like to address the BUG:  git submodule $cmd
$pathspec may want to error out when the $pathspec does not match any
submodules. [1]
I will also be keeping some additional buffer time here.


July 4 to July 17: (Week 8 - Week 9):

In this span of 2 weeks, I would like to fix the bugs introduced by the
new code. Also, will be working on fixing the issue: git submodule $cmd
$pathspec may want to error out when the $pathspec does not match any
submodules. [1]

July 18 to July 31: (Week 10 - Week 11):

Check the test coverage of the code, and improve it, wherever required.
Also will work on optimizing the code written, as required.

August 1 to August 7: (Week 12):

I would keep this final week to polish the code in written till then,
and give it the final touches as required.

August 8 to August 29 :

I would like to keep this time as buffer, for fixing any bugs created by the
rewrite. Also in this time I would like to work on my wish list if I have time
remaining.

Wish-list

I have already discussed these topics before [2], but since I was more
interested in doing the above project, I’m adding these to my wishlist.
1. "git -C sub add ." might behave just like "git add sub"
2. Teach "git -C <submodule-path> status" in an unpopulated
   submodule to report the submodule being unpopulated, do not
   fall back to the superproject.
3. Teach "git log -- <path/into/submodule/and/further>" to behave
   like "git -C <path/into/submodule> log -- <and/further>"

Microproject Attempted

The microproject which I attempted is: “Avoid pipes in git related
commands for test suite”. I hope it would soon get merged.
You can find my patch here.[3]

Plans for Summer

I recently went through an ACL-reconstruction surgery, and hence in
summers I'll be just continuing with the physiotherapy I'm
undergoing from last 3 months.


[1] : https://git-blame.blogspot.in/p/leftover-bits.html
[2] : https://public-inbox.org/git/CAME+mvW1x6fnGKt1_auGOp+wFYFR=Y_Qhxfd50E7KFe6t+X4kw@xxxxxxxxxxxxxx/
[3] : https://public-inbox.org/git/20170324120433.2890-1-pc44800@xxxxxxxxx/




[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]