Re: [BUG?] fetch into shallow sends a large number of objects

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Mon, Mar 07, 2016 at 03:47:53PM -0800, Junio C Hamano wrote:

> > Is this user error to call "git fetch" without "--depth" in the
> > subsequent cases? Or should git remember that we are in a shallow repo,
> > and presume that the user by default wants to keep things shallow?
> 
> Hmph, you shouldn't, and I somehow thought that you do not, have to
> explicitly say things like "--deepen" to break the original
> shallowness, but your example illustrates that the logic to do so is
> not well thought out.  A new side branch will prevent you from
> hitting an already-known shallow cut-off and traverse down to the
> root.
> 
> Giving a random "depth" in subsequent fetch would however not work
> very well, I suspect, as that is very prone to make the part of the
> history the user originally obtained, and presumably used to build
> her own history, into an island that is unconnected to the updated
> tip of the history.  

Yeah, if we just recorded the original "--depth=1", each fetch would
make its own little island. Eventually your "shallow" file would bloat.
I'm not sure what that would do for the performance of subsequent
operations.

> I also do not offhand think of a good way to use the topology or
> timestamp to figure out the best "depth" to truncate the side branch
> at.  The server side may be able to figure out that things before 'F'
> in your picture is not relevant for a client that has the shallow
> cut-off at 067f265, but the side branch can be forked arbitrarily
> long in the past, or it may not even share the ancient part of the
> history and has its own root commit.

Right. I think conceptually that we want a graph cut, across all
branches. But you can't compute that cut at the time of the original
shallow clone, as we might not even have the side branches yet. So it
has to be computed at the time of the subsequent fetch.

You could make an arbitrary cut across all branches based on the
timestamp of the --shallow bottom. That would create new shallow bottoms
on any parallel lines of development that are brought in, from "around
the same time". But I'm uncomfortable relying on timestamps, and I
suspect there are some pretty bad corner cases with skew.

What if we instead marked the parent of a shallow commit UNINTERESTING?
Something like:

diff --git a/builtin/pack-objects.c b/builtin/pack-objects.c
index abed871..fc7a469 100644
--- a/builtin/pack-objects.c
+++ b/builtin/pack-objects.c
@@ -2780,6 +2780,9 @@ static void get_object_list(int ac, const char **av)
 				unsigned char sha1[20];
 				if (get_sha1_hex(line + 10, sha1))
 					die("not an SHA-1 '%s'", line + 10);
+				handle_revision_arg(xstrfmt("^%s^", line + 10),
+						    &revs, flags,
+						    REVARG_CANNOT_BE_FILENAME);
 				register_shallow(sha1);
 				use_bitmap_index = 0;
 				continue;

In this case it means we pull in all of that side branch, down to the
fork point. If it went back to ancient history (or even had its own root
commit), then so be it; that's what the user asked for.  But I think
this would cover the common case of people shallow-cloning and fetching
a normal workflow.

There are two problems, though:

  1. We use UNINTERESTING for two things in pack-objects: to avoid
     sending the commits, but also to assume that the user has the
     objects, for making thin-pack deltas.

     We want only the former, but not the latter (if you apply my patch
     and try a fetch, you'll see that the client is missing bases from
     the resulting pack).

  2. We've effectively re-shallowed the client at the fork point. We
     have to tell them that somehow, or they're going to complain that
     we didn't send them all of the objects.

So I think the solution to both is that we need to do a _separate_
traversal with all of the positive tips we're going to send, and the
parents of any shallow commits the client has, to find their fork points
(i.e., merge bases). And then we add those fork points to the shallow
list (grafting out their parents), and communicate them to the client to
add to its shallow setup.

-Peff
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html



[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]