Re: [PATCH] fetch: avoid quadratic loop checking for updated submodules

Jeff King <peff@xxxxxxxx> · Mon, 12 Sep 2011 18:49:34 -0400

On Mon, Sep 12, 2011 at 02:25:38PM -0700, Junio C Hamano wrote:

> > +static void calculate_changed_submodule_paths() {
> 
> Locally fixed while queueing; no need to resend:
> 
>         static void calculate_changed_submodule_paths(void)
>         {

Oops, thanks. Old habits die hard, I guess. :)

> I initially thought "eh, what if we have very many refs that may not fit
> on the command line", but this is running the internal rev-list, so there
> is no such issue.
> 
> I however have to wonder if the use of object flags done by this revision
> walking need to be cleared to prevent them from interfering with the
> ancestry walking done in find_common(). Shouldn't this be done in a
> subprocess "rev-list" that is fed these positive and negative refs from
> its standard input?

Maybe. I haven't noticed any such breakage. I don't know if it's because
it's not there, or we don't have sufficient test cases. But if it were a
problem, it would be pretty severe, wouldn't it? The traversal is done
whether you actually have submodules or not, so lots of people should be
seeing it.

I tried to keep the patch as minimal as possible.  If I were writing
from scratch, I think my strategy would be to actually start a
traversal, and mark the commit objects themselves as we find them. I.e.,
the two steps would be:

  - start a traversal
  - for each existing ref
    - lookup_commit_reference, mark as SEEN
  - for each new commit we fetched
    - walk commits from $new_sha1 until we hit SEEN, marking each as
      SEEN as we go.

IOW, instead of keeping "before" and "after" lists, just mark the
commits. But that perhaps exacerbates the protential problem you
mentioned (though we could maybe just use a different flag than SEEN).

To be honest, the whole submodule recursion thing seems a bit confusing
to me. We walk every commit we just fetched looking for it to introduce
a change in a submodule. But we don't actually remember the sha1 it
changed to. And even if we did, we can't just ask a remote for a sha1,
anyway. So with a set of changes like:

  [assume submodule at commit A, superproject at commit B]

  1. Make commit C in submodule repo.

  2. Make commit D in superproject repo.

  3. Make commit E in submodule repo.

  4. Make commit F in superproject repo.

what does it buy us to find out that the submodule changed from "A" to
"C"? We can't actually fetch it. We can only fetch the tips of the
submodule and hope that they include everything we wanted (i.e., both C
and E; which might not be the case of E rewound and is not a descendant
of C).

So since we must accept that we can't necessarily get every intermediate
step, I wonder if we are simply better off diffing the "before" and
"after" state of a particular ref, rather than traversing. It's way
cheaper, and is just as likely to give us the same information (i.e.,
which submodule paths had changed commits). It can be fooled by rewinds,
too, of course:

  1. Make commit C in submodule repo.

  2. Make commit D in superproject repo.

  3. Rewind submodule repo back to A.

  4. Make commit E in superproject repo.

But in that case, you probably can't get commit "C" from the submodule
anyway, so knowing about it is unlikely to help.

I dunno. Maybe there is more going on. This is the first time I've
looked at the problem.

Also also. I was a little turned off by the fact that every fetch is
going to do the equivalent of "git log --raw -m $new --not $old",
whether you have submodules or not. If you fetch a lot of history, this
is a non-trivial price to pay. Is there some way we can detect whether
submodules are being used at all, and not even bother with this
traversal otherwise? Like checking whether .gitmodules exists? I know
virtually nothing about how submodules work.

-Peff
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html