Change set based shallow clone

"Jon Smirl" <jonsmirl@xxxxxxxxx> · Thu, 7 Sep 2006 15:52:59 -0400

Here's a change set based shallow clone scheme I've been thinking
about, does it have potential?

When the client wants a shallow clone it starts by telling the server
all of the HEADs and how many change sets down each of those HEADs it
has locally. That's a small amout of data to transmit and it can be
easily tracked. Let's ignore merged branches for the moment.

The client then says I want at least 10 (or N) change sets for all of
the HEADs present at the server.  The server starts from each HEAD and
works backwards until it encounters a change set present on the
client. At that point it will be able to compute efficient deltas to
send.

If you haven't updated for six months when the server walks backwards
for 10 change sets it's not going to find anything you have locally.
When this situation is encountered the server needs to generate a
delta just for you between one of the change sets it knows you have
and one of the 10 change sets you want. By generating this one-off
delta it lets you avoid the need to fetch all of the objects back to a
common branch ancestor. The delta functions as a jump over the
intervening space.

In the case of an initial shallow clone the client won't have anything
to delta against.  The server will be forced to send a full version
for one of the 10 change sets requested and deltas for the rest.
Getting an initial shallow clone should take about as long as a CVS
check out.

This scheme does require the server to sometimes generate custom diffs
for the client, but in all the cases I have been working with
everything is always IO bound so it is better to spend some CPU to
reduce the IO needed.

--
Jon Smirl
jonsmirl@xxxxxxxxx
-
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html