Re: [RFC PATCH] During a shallow fetch, prevent sending over unneeded objects

Matthijs Kooijman <matthijs@xxxxxxxx> · Wed, 7 Aug 2013 12:27:16 +0200

Hi Junio,

I haven't got a reply to my mail yet. Could you have a look, so I can
update and resubmit my patch?

On Fri, Jul 12, 2013 at 09:11:57AM +0200, Matthijs Kooijman wrote:
> > [administrivia: you seem to have mail-followup-to that points at you
> > and the list; is that really needed???]
> > In your discussion (including the comment), you talk about "shallow
> > root" (I think that is the same as what we call "shallow boundary"),
> I think so, yes. I mean to refer to the commits referenced in
> .git/shallow, that have their parents "hidden".
Could you confirm that I got the terms right here (or is the shallow
boundary the first hidden commit?)

> > but in this added block, there is nothing that checks CLIENT_SHALLOW
> > or SHALLOW flags to special case that.
> >
> > Is it a good idea to unconditionally do this for all "have"
> > revisions?
> That's what I meant in my mail with "applying the fix unconditionally" -
> there is probably some check needed (I discussed a few options in the
> mail as well).
>
> Note that this entire do_rev_list function is only called when there are
> shallow revisions involved, so there is also a basic "only when shallow"
> check in place.

My proposal was to only apply the fix for all have revisions when the
previous history traversal came across some shallow boundary commits. If
this happens, then that shallow boundary commit will be a "new" one and
it will have prevented the history traversal from finding the full list
of relevant "have" commits. In this case, we should just use all "have"
commits instead.

Now, looking at the code, I see a few options for detecting this case:

 1 Modify mark_edges_uninteresting to return a boolean (or have an
   output argument) if any of the commits in the list of commits to find
   (not the edges) is a shallow boundary.
 2 Modify mark_edges_uninteresting to have a "show_shallow" argument
   that gets called for every shallow boundary. The show_shallow
   function passed would then simply keep a boolean if it is passed at
   least once.
 3 Add another loop over the commits _after_ the call to
   mark_edges_uninteresting, that simply looks for any shallow boundary
   commit.

The last option seems sensible to me, since it prevents modifying the
somewhat generic mark_edges_uninteresting function for this specific
usecase. On the other hand, it does mean that the list of commits is
looped twice, not sure what that means for performance.

Before I go and implement one of these, which option seems best to you?

Gr.

Matthijs
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html