Re: Git Scaling: What factors most affect Git performance for a large repo?

Duy Nguyen <pclouds@xxxxxxxxx> · Wed, 25 Feb 2015 19:02:40 +0700

On Sat, Feb 21, 2015 at 11:01 AM, Duy Nguyen <pclouds@xxxxxxxxx> wrote:
> I wonder how efficient rsync is for transferring these refs: the
> client generates a "file" containing all refs, the server does the
> same with their refs, then the client rsync their file to the server..
> The changes between the server and the client files are usually small,
> I'm hoping rsync can take advantage of that.

Some numbers without any actual coding. After the initial clone, we
store the server's refs in a file called base-file at client. At the
next push or pull, the server saves its refs in 'new-file'. Using
rsync to avoid initial ref advertisement would involve these steps
(rdiff command is from librsync)

client> rdiff signature base-file signature
(client sends "signature" file to server)
server> rdiff delta signature new-file delta
(server sends "delta" file back to client)
client> rdiff patch base-file delta new-file

The exchanged files over network are "signature" and "delta". I used
my git.git's packed-refs as the base-file (1416 refs, 78789 bytes) and
modifies three lines to create new-file. That produced a signature
file of 480 bytes and delta file of 6163 bytes. That's 7% the size of
the new file. Good.

When I modified more lines in new-file (15 lines), the delta file grew
to 26644 bytes ("sig" file remains the same because it only depends on
base-file). Total transferred bytes were 60% the size of new-file.
Less impressive. Maybe there's some tuning options for better
results...

The same process could be used to transfer the whole client refs to
server instead of sending lots of "have" lines. I suspect there will
be more changes between client's "have" file and server ref list. If
the changes spread out and cause a lot of blocks to be sent, the
saving would not be as high as I wanted. I guess that's it for rsync
idea.
-- 
Duy
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html