On 2/12/07, Johannes Schindelin <Johannes.Schindelin@xxxxxx> wrote:
Hi, On Sun, 11 Feb 2007, Junio C Hamano wrote: > You are assuming everybody does initial clone all the time. I do not > think that holds true in practice. It depends how you interpret "all the time". What you (Junio) are suggesting is that the count of initial clones is relatively small as compared to the total number of fetches. However, you can interpret "all the time" in terms of "time". Most fetches are really small. They even often end up in no objects pulled at all. These are cheap for the server. The initial clones take a long time. They are expensive. I'd be interested to learn how much of the CPU time is actually spent in initial clones, rather than other types of fetches. It might make sense yet to optimize initial clones.
I don't think CPU is a problem at kernel.org, but disk IO defnitely is. The initial clones cause several minutes (sometimes 10 min or more when there kernel.org is loaded) worth of disk IO. They also totally thrash the kernel.org cache. The alternative of using a clone to trigger a repack would go through this once, and then use sendfile (is gitd that smart?) to send the packs. Sendfile uses the smallest cache required. Why doesn't clone copy the existing packs down first with sendfile, then build a small pack for what is left and avoid the initial step of making a giant pack. Isn't clone going to break when the repo exceeds 2GB?
Ciao, Dscho
-- Jon Smirl jonsmirl@xxxxxxxxx - To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html