Re: Git fetch bug in git 2.21+ "Could not access submodule '%s'"

Junio C Hamano <gitster@xxxxxxxxx> · Wed, 14 Aug 2019 09:40:40 -0700

Jeff King <peff@xxxxxxxx> writes:

> Because you created a file in the uninitialized submodule directory, it
> fools the is_empty_dir() check. It seems like there should be a more
> robust way to check whether the submodule is initialized. Maybe:
>
> diff --git a/submodule.c b/submodule.c
> index 77ace5e784..748ebe5909 100644
> --- a/submodule.c
> +++ b/submodule.c
> @@ -1294,6 +1294,9 @@ static int get_next_submodule(struct child_process *cp,
>  		if (!S_ISGITLINK(ce->ce_mode))
>  			continue;
>  
> +		if (!is_submodule_active(spf->r, ce->name))
> +			continue;
> +
>  		submodule = submodule_from_path(spf->r, &null_oid, ce->name);
>  		if (!submodule) {
>  			const char *name = default_name_or_path(ce->name);
>
> but that seems to fail t5526's "on-demand works without .gitmodules
> entry" test.
>
> I think is_submodule_populated_gently() more exactly matches what the
> current code is trying to do, like so:
>
> diff --git a/submodule.c b/submodule.c
> index 77ace5e784..4b26faee5d 100644
> --- a/submodule.c
> +++ b/submodule.c
> @@ -1347,8 +1347,7 @@ static int get_next_submodule(struct child_process *cp,
>  			 * An empty directory is normal,
>  			 * the submodule is not initialized
>  			 */
> -			if (S_ISGITLINK(ce->ce_mode) &&
> -			    !is_empty_dir(ce->name)) {
> +			if (is_submodule_populated_gently(ce->name, NULL)) {
>  				spf->result = 1;
>  				strbuf_addf(err,
>  					    _("Could not access submodule '%s'"),
>
> but it feels odd to me. Even if the submodule is not currently checked
> out, we'd presumably still want to do the recursive fetch as long as we
> have a repo under $GIT_DIR/modules?

... which means that we are not interested in "is it populated?" but
in "have we done 'git submodule init' to show interest in it?".  But
since we are walking the in-core index and picking only the gitlink
entries in it in the early part of this loop, we know ce cannot be
anything but a submodule at this point, so we will not be in the "we
are interesteed in the submodule, but the current HEAD and index is
at a commit that does not have it, hence $GIT_DIR/modules/ is the
only place that knows about it" situation.  If we are interested in
it enough to have a repository stashed under $GIT_DIR/modules/, we
should have a submodule there, shouldn't we?

What I do not quite get is that repo_submodule_init(), which is
called by get_submodule_repo_for(), looks into $GIT_DIR/modules/,
according to the in-code comment of that function.  So "we cannot
get the repo for it, which is an error condition, but we will
complain only for non-empty directory" logic feels iffy.

Stepping back even a bit more, "an empty directory is normal" makes
some sense.  If the user or the build system created a non-directory
at a path where a populated submodule would sit, that would not be
good.  If the user or the build system created a random file in the
unpopulated empty directory, in which the working tree files of the
submodule would be created once the submodule getspopulated, that
would be equally bad, wouldn't it?  Why is the user mucking with
that directory in the first place, and isn't the flagging of the
situation as an error, done with 26f80ccf ("submodule: migrate
get_next_submodule to use repository structs", 2018-11-28), a
bugfix?  If not, why not?

Puzzled....