Re: [GSoC][RFC] proposal: convert git-submodule to builtin script

Johannes Schindelin <Johannes.Schindelin@xxxxxx> · Wed, 3 Apr 2019 20:48:28 +0200 (DST)

Hi,

On Tue, 2 Apr 2019, Khalid Ali wrote:

> My name is Khalid Ali and I am looking to convert the git-submodule to
> a builtin C script. The link below contains my first proposal draft
> [1] and my microproject is at [2]. My main concern is that my second
> task is not verbose enough. I am not sure if I should add a specific
> breakdown of large items within the submodule command.

Nice!

Please note that while I used to be the mentor who basically helped all of
the GSoC/Outreachy students through their "convert to built-in" projects
in the recent years, I am not available to mentor this year.

Having said that, I think I can help you to improve your proposal.

When you talk about "Convert each main task in git-submodule into a C
function." and "If certain functionality is missing, add it to the correct
script.", it is a good idea to back that up by concrete examples.

Like, study `git-submodule.sh` and extract the list of "main tasks", and
then mention that in your proposal. I see that you listed 9 main tasks,
but it is not immediately clear whether you extracted that list from the
usage text, from the manual page, or from the script itself. If the latter
(which I think would be the best, given the goal of converting the code in
that script), it would make a ton of sense to mention the function names
and maybe add a permalink to the corresponding code (you could use e.g.
GitHub's permalinks).

And then look at one of those main tasks, think of something that you
believe should be covered in the test suite, describe it, then figure out
whether it is already covered. If it is, mention that, together with the
location, otherwise state which script would be the best location, and
why.

Further, I would like to caution you about "If there is still some
time"... The `git-submodule.sh` script weighs in with just over 1,000
lines. We had three GSoC projects to convert scripts last year, and they
converted scripts' weights (at the time) were 750 lines for
`git-stash.sh`, 674 lines for `git-rebase.sh` and 1,036 lines for
`git-rebase--interactive.sh`, respectively. That last number should be
taken with a big grain of salt, as is not quite the number of lines that
were converted: as part of the GSoC project, the
`git-rebase--preserve-merges.sh` script was split out, never intended to
be converted, but to be deprecated instead (in favor of `git rebase -r`),
and there were "only" some 283 lines to be converted to C remaining after
that.

Out of those three, the project converting the smallest number of lines
clearly got integrated first (and there was actually time to do more stuff
in that project, and those things are partially still being cooked). The
converted `git stash` is still not in `master`...

So... converting 1,000 lines of code is quite a challenge for 3 months.

Having said that, I would not consider your project a failure if even
"only" as much as half of the lines of code were converted to C.

Besides, if you care to have a bit of a deeper look into the
`git-submodule.sh` script, you will see a peculiar pattern in some of the
subcommands, e.g. in `cmd_foreach`:
https://github.com/git/git/blob/v2.21.0/git-submodule.sh#L320-L349

Essentially, it spends two handfuls of lines on option parsing, and then
the real business logic is performed by the `submodule--helper`, which is
*already* a built-in.

Even better: most of that business logic is implemented in a file that has
the very file name you proposed already: `submodule.c`.

So if I were you, I would add a section to your proposal (which in the end
would no doubt dwarf the existing sections) that has as subsections each
of those commands in `git-submodule.sh` that do *not* yet follow this
pattern "parse options then hand off to submodule--helper".

I would then study the commit history of the ones that *do* use the
`submodule--helper` to see how they were converted, what conventions were
used, whether there were recurring patterns, etc.

In each of those subsections, I would then discuss what the
still-to-be-converted commands do, try to find the closest command that
already uses the `submodule--helper`, and then assess what it would take
to convert them, how much code it would probably need, whether it could
reuse parts that are already in `submodule.c`, etc.

> Outside of the draft, I was wondering whether this should be
> implemented through multiple patches to the master branch or through a
> separate, long-running feature branch that will be merged at the end
> of the GSoC timeline?

Judging from past projects to convert scripts to C, I would say that the
most successful strategy was to chomp off manageable parts and move them
from the script to C. I am sure that you will find tons of good examples
for this strategy by looking at the commit history of `git-submodule.sh`
and then searching for the corresponding patches in the Git mailing list
archive (e.g. https://public-inbox.org/git/).

Do not expect those "chomped off" parts to hit `master` very quickly,
though. Most likely, you would work on one patch series (very closely with
your mentor at first, to avoid unnecessary blocks and to get a better feel
for the way the Git community works right from the start), then, when that
patch series is robust and solid and ready to be contributed, you would
send it to the Git mailing list and immediately start working on the next
patch series, all the while the reviews will trickle in. Those reviews
will help you to improve the patch series, and it is a good idea to
incorporate the good suggestions, and to discuss the ones you think are
not necessary, for a few days before sending the next patch series
iteration.

Essentially, you will work in parallel on a few patch series at all times.
Those patch series stack on top of each other, and they should, one after
the other, make it into `pu` first, then, when they are considered ready
for testing into `next`, and eventually to `master`. Whenever you
contribute a new patch series iteration, you then rebase the remaining
patch series on top. Ideally it will look a bit like a fern, with the
first patch series being along the farthest, and each subsequent patch
series at an earlier stage than its predecessor.

Phew. Long mail. Hopefully this amount of information does not scare you.
And maybe some of it will help you with the proposal and/or the project.

Ciao,
Johannes

> Feedback is greatly appreciated!
>
> [1] https://docs.google.com/document/d/1olGG8eJxFoMNyGt-4uMiTD3LjRYx15pttg67AJYliu8/edit?usp=sharing
> [2] https://public-inbox.org/git/20190402014115.22478-1-khalludi123@xxxxxxxxx/
>