Re: Conversion of 'git submodule' to C: need some help

Johannes Schindelin <Johannes.Schindelin@xxxxxx> · Thu, 13 Feb 2020 14:33:58 +0100 (CET)

Hi Shourya,

just adding a little to what Abhishek said (which was pretty sound
advice!) below.

On Sun, 9 Feb 2020, Shourya Shukla wrote:

> I am facing some problems and would love some insight on them:
>
> 	1. What exactly are we aiming in [3]? To replace the function completely
> 	   or to just add some 'repo_submodule_init' functionality?

If you follow the "Git blame" link in the breadcrumb menu, you will get to
the commit that added the TODO:
https://github.com/periperidip/git/commit/18cfc0886617e28fb6d29d579bec0ffcdb439196

Unfortunately, it does not necessarily help me understand what that TODO
is about. So let's analyze the code:

int add_submodule_odb(const char *path)
{
	struct strbuf objects_directory = STRBUF_INIT;
	int ret = 0;
	ret = strbuf_git_path_submodule(&objects_directory, path, "objects/");
	if (ret)
		goto done;
	if (!is_directory(objects_directory.buf)) {
		ret = -1;
		goto done;
	}
	add_to_alternates_memory(objects_directory.buf);
done:
	strbuf_release(&objects_directory);
	return ret;
}

Okay, so this just adds the object database of the submodule (if it
exists, if it does not exist, the submodule is probably _already_ using
the superproject's database).

To understand what I am talking about, have a look at this document:
https://git-scm.com/docs/gitrepository-layout#Documentation/gitrepository-layout.txt-objects

So what does the function do that was suggested as a better alternative?

int repo_submodule_init(struct repository *subrepo,
			struct repository *superproject,
			const struct submodule *sub)
{
	struct strbuf gitdir = STRBUF_INIT;
	struct strbuf worktree = STRBUF_INIT;
	int ret = 0;

	if (!sub) {
		ret = -1;
		goto out;
	}

	strbuf_repo_worktree_path(&gitdir, superproject, "%s/.git", sub->path);
	strbuf_repo_worktree_path(&worktree, superproject, "%s", sub->path);

	if (repo_init(subrepo, gitdir.buf, worktree.buf)) {
		/*
		 * If initialization fails then it may be due to the
		 * submodule
		 * not being populated in the superproject's worktree.
		 * Instead
		 * we can try to initialize the submodule by finding it's
		 * gitdir
		 * in the superproject's 'modules' directory.  In this
		 * case the
		 * submodule would not have a worktree.
		 */
		strbuf_reset(&gitdir);
		strbuf_repo_git_path(&gitdir, superproject,
				     "modules/%s", sub->name);

		if (repo_init(subrepo, gitdir.buf, NULL)) {
			ret = -1;
			goto out;
		}
	}

	subrepo->submodule_prefix = xstrfmt("%s%s/",
					    superproject->submodule_prefix ?
					    superproject->submodule_prefix :
					    "", sub->path);

out:
	strbuf_release(&gitdir);
	strbuf_release(&worktree);
	return ret;
}

Ah, that populates a complete `struct repository`! I fear, however, that
our object lookup is currently not tied to such a `struct repository`
instance. So I think that this TODO can only be addressed once a ton more
patch series like
https://lore.kernel.org/git/f1e4da02-9411-8a93-ca62-6d7ae7bf4ae8@xxxxxxxxx/
made it not only to the Git mailing list, but into `master`.

> 	2. Something I inferred was that functions with names of the pattern 'strbuf_git_*'
> 	   are trying to 'create a path'(are they physically creating the path or just
> 	   instructing git about them?) while functions of the pattern 'git_*' are trying
> 	   to check some conditions denoted by their function names(for instance
> 	   'git_config_rename_section_in_file')? Is this inference correct to some extent?

All `strbuf_*()` functions work on our "string class" (I forgot who said
it, but it is true that any sufficiently advanced C project sooner or
later develops their own string data type).

To know whether the functions in question create a path or not, you will
have to find their documentation in the appropriate header file (usually
`strbuf.h`), or absent that, find and understand their implementation
(usually in `strbuf.c`).

> 	3. How does one check which all parts of a command have been completed? Is it checked
> 	   by looking at the file history or by comparing with the shell script of the command
> 	   or are there any other means?

You mean whether a scripted command has been completely converted to C?
There is no universal way to do that.

In `git submodule`'s instance, I would say that a subcommand is converted
successfully when all parts except for the command-line option parsing
have been moved into the `submodule--helper`. Eventually,
`git-submodule.sh` will only have functions that parse command-line
options and then pass the result on to the helper. At that point, the
command-line option parsing can _also_ be moved into the helper. Or maybe
even the entire script in one go, I am not sure how big of a patch that
would be.

> 	4. Is it fine if I am not able to understand the purpose of certain functions right now(such as
> 	   'add_submodule_odb')? I am able to get a rough idea of what the functions are doing but I am
> 	   not able to decode certain functions line-by-line.

It is okay not to understand all the details, but if you want to work on
the code, you will need to understand at least the purpose, and if you
want to come up with a project plan (e.g. for GSoC), it will be _really_
helpful to form an understanding of the implementation details, too.

> Currently, I am studying in depth about 'git objects' and the submodule command on the git Documentation.
> What else do would you advise me to strengthen my understanding of the code and git in general?

I don't know what in particular you want to strengthen. Typically, a good
way to learn enough about the code base in preparation for Google Summer
of Code or Outreachy is to read the code, and whenever anything is
unclear, try to learn about the data structures and/or the underlying
design by studying the files in `Documentation/` (in particular in the
`technical/` subdirectory) whose names seem relevant.

Ciao,
Johannes

>
> Regards,
> Shourya Shukla
>
> [1]: https://github.com/periperidip/git/blob/v2.25.0/submodule.c
> [2]: https://lore.kernel.org/git/20200201173841.13760-1-shouryashukla.oo@xxxxxxxxx/
> [3]: https://github.com/periperidip/git/blob/v2.25.0/submodule.c#L168
>
>