Re: [PATCH v2 1/1] rm: stage submodule removal from '.gitmodules' when using '--cached'

Junio C Hamano <gitster@xxxxxxxxx> · Mon, 22 Feb 2021 10:58:51 -0800

Shourya Shukla <periperidip@xxxxxxxxx> writes:

> Currently, using 'git rm --cached <submodule>' removes the submodule
> <submodule> from the index and leaves the submodule working tree
> intact in the superproject working tree, but does not stage any
> changes to the '.gitmodules' file, in contrast to 'git rm <submodule>',
> which removes both the submodule and its configuration in '.gitmodules'
> from the worktree and index.
>
> Fix this inconsistency by also staging the removal of the entry of the
> submodule from the '.gitmodules' file, leaving the worktree copy intact,

The "also" above felt a bit puzzling, as we would be removing the
entry only from the indexed copy without touching the working tree
(by the way, I try to be precise in terminology between worktree and
working tree, and please follow suit.  A working tree is what you
have in a non-bare repository that let's you "less" "gcc" etc. on
the files checked out.  A worktree refers to the mechanism that lets
you have separate working tree by borrowing from a repository, or
refers to an instance of a working tree plus .git file created by
the mechanism.  You mean "working tree" in the above sentence), but
it refers to "remove the submodules directory and also entry", so it
is OK.

> diff --git a/builtin/rm.c b/builtin/rm.c
> index 4858631e0f..5854ef0996 100644
> --- a/builtin/rm.c
> +++ b/builtin/rm.c
> @@ -254,7 +254,7 @@ static struct option builtin_rm_options[] = {
>  int cmd_rm(int argc, const char **argv, const char *prefix)
>  {
>  	struct lock_file lock_file = LOCK_INIT;
> -	int i;
> +	int i, removed = 0;
>  	struct pathspec pathspec;
>  	char *seen;
>  
> @@ -365,30 +365,33 @@ int cmd_rm(int argc, const char **argv, const char *prefix)
>  	if (show_only)
>  		return 0;
>  

> +	for (i = 0; i < list.nr; i++) {
> +		const char *path = list.entry[i].name;
> +		if (list.entry[i].is_submodule) {
> +			/*
> +			 * Then, unless we used "--cached", remove the filenames from
> +			 * the workspace. If we fail to remove the first one, we
> +			 * abort the "git rm" (but once we've successfully removed
> +			 * any file at all, we'll go ahead and commit to it all:
> +			 * by then we've already committed ourselves and can't fail
> +			 * in the middle)
> +			 */
> +			if (!index_only) {
> +				struct strbuf buf = STRBUF_INIT;
>  				strbuf_reset(&buf);
>  				strbuf_addstr(&buf, path);
>  				if (remove_dir_recursively(&buf, 0))
>  					die(_("could not remove '%s'"), path);
>  
>  				removed = 1;
> +				strbuf_release(&buf);

OK, so this part now only deals with the submodule directory.

>  			}
> +			if (!remove_path_from_gitmodules(path, index_only))
> +				stage_updated_gitmodules(&the_index);

And the entry for it in .gitmodules is handled by the helper,
whether --cached or not.

This somehow feels wrong for the index_only case; doesn't the helper
take contents from the .gitmodules in the working tree and add it to
the index?

Unless you touched stage_updated_gitmodules() not to do that in the
index_only case, that is.

> +			continue;

And that is all for what is done to a submodule.

Makes sense so far.

> +		}
> +		if (!index_only) {
>  			if (!remove_path(path)) {
>  				removed = 1;
>  				continue;
> @@ -396,9 +399,6 @@ int cmd_rm(int argc, const char **argv, const char *prefix)
>  			if (!removed)
>  				die_errno("git rm: '%s'", path);
>  		}
> -		strbuf_release(&buf);
> -		if (gitmodules_modified)
> -			stage_updated_gitmodules(&the_index);

OK, because this should have been done where we called
remove_path_from_gitmodules().

>  	}
>  
>  	if (write_locked_index(&the_index, &lock_file,
> diff --git a/submodule.c b/submodule.c
> index 9767ba9893..6ce8c8d0d8 100644
> --- a/submodule.c
> +++ b/submodule.c
> @@ -131,7 +131,7 @@ int update_path_in_gitmodules(const char *oldpath, const char *newpath)
>   * path is configured. Return 0 only if a .gitmodules file was found, a section
>   * with the correct path=<path> setting was found and we could remove it.
>   */
> -int remove_path_from_gitmodules(const char *path)
> +int remove_path_from_gitmodules(const char *path, int index_only)
>  {
>  	struct strbuf sect = STRBUF_INIT;
>  	const struct submodule *submodule;
> @@ -149,7 +149,8 @@ int remove_path_from_gitmodules(const char *path)
>  	}
>  	strbuf_addstr(&sect, "submodule.");
>  	strbuf_addstr(&sect, submodule->name);
> -	if (git_config_rename_section_in_file(GITMODULES_FILE, sect.buf, NULL) < 0) {
> +	if (git_config_rename_section_in_file(index_only ? GITMODULES_INDEX :
> +					      GITMODULES_FILE, sect.buf, NULL) < 0) {
>  		/* Maybe the user already did that, don't error out here */
>  		warning(_("Could not remove .gitmodules entry for %s"), path);
>  		strbuf_release(&sect);

When !index_only, do we have any guarantee that .gitmodules in the
working tree and .gitmodules in the index are in sync?  I somehow
doubt it.  

I would have expected that the updated remove_path_from_gitmodules()
would look more like:

 - only if !index_only, nuke the section for the submodule in
   .gitmodules in the working tree.

 - nuke the section for the submodule in .gitmodules in the
   index.

IOW, there would be two git_config_rename_section_in_file() calls to
remove the section in !index_only case.

Doing so would also mean that you should not have the caller call
stage_updated_gitmodules() at all, even in !index_only case.
Imagine if the .gitmodules file in the working tree had local
changes (e.g. registered a few more submodules, or updated the url
field of a few submodules) that are not yet added to the index when
"git rm" removed a submodule.  The user does not want them to be in
the index yet and "git rm" should not add these unrelated local
changes to the index.

Thanks.