Re: Efficiency of initial clone from server

"Jon Smirl" <jonsmirl@xxxxxxxxx> · Mon, 12 Feb 2007 09:31:48 -0500

On 2/12/07, Johannes Schindelin <Johannes.Schindelin@xxxxxx> wrote:
Hi,

On Sun, 11 Feb 2007, Junio C Hamano wrote:

> You are assuming everybody does initial clone all the time.  I do not
> think that holds true in practice.

It depends how you interpret "all the time". What you (Junio) are
suggesting is that the count of initial clones is relatively small as
compared to the total number of fetches.

However, you can interpret "all the time" in terms of "time". Most fetches
are really small. They even often end up in no objects pulled at all.
These are cheap for the server. The initial clones take a long time. They
are expensive.

I'd be interested to learn how much of the CPU time is actually spent in
initial clones, rather than other types of fetches. It might make sense
yet to optimize initial clones.

I don't think CPU is a problem at kernel.org, but disk IO defnitely
is. The initial clones cause several minutes (sometimes 10 min or more
when there kernel.org is loaded) worth of disk IO. They also totally
thrash the kernel.org cache. The alternative of using a clone to
trigger a repack would go through this once, and then use sendfile (is
gitd that smart?) to send the packs. Sendfile uses the smallest cache
required.

Why doesn't clone copy the existing packs down first with sendfile,
then build a small pack for what is left and avoid the initial step of
making a giant pack. Isn't clone going to break when the repo exceeds
2GB?

Ciao,
Dscho

--
Jon Smirl
jonsmirl@xxxxxxxxx
-
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html