Re: [Question] clone performance

Bryan Turner <bturner@xxxxxxxxxxxxx> · Sat, 24 Aug 2019 14:00:04 -0700

On Fri, Aug 23, 2019 at 6:59 PM <randall.s.becker@xxxxxxxxxx> wrote:
>
> Hi All,
>
> I'm trying to answer a question for a customer on clone performance. They
> are doing at least 2-3 clones a day, of repositories with about 2500 files
> and 10Gb of content. This is stressing the file system.

Can you go into a bit more detail about what "stress" means? Using too
much disk space? Too many IOPS reading/packing? Since you specifically
called out the filesystem, does that mean the CPU/memory usage is
acceptable?

Depending on how well-packed the repository is, Git will reuse a lot
of the existing pack (and a "perfectly" packed repository can achieve
complete reuse, with no "Compressing objects" phase at all). Delta
islands[1] can help increase reuse and reduce the need for on-the-fly
compression, if the repository includes a lot of refs that aren't
generally cloned.

Another relatively recent addition is uploadpack.packobjectshook[2],
which can simplify caching of packfiles so they can be reused on
subsequent requests. Whether or not this will be beneficial is likely
to be influenced by how many times the exact same commits are cloned
and how much extra disk space is available for storing cached packs.

Not sure if any of this is helpful, but I hope it will be!
Bryan

[1] https://git-scm.com/docs/git-pack-objects#_delta_islands
[2] https://git-scm.com/docs/git-config#Documentation/git-config.txt-uploadpackpackObjectsHook
> I have tried to
> convince them that their process is not reasonable and should stick with
> existing clones, using branch checkout rather that re=cloning for each
> feature branch. Sadly, I have not been successful - not for a lack of
> trying. Is there any way to improve raw clone performance in a situation
> like this, where status really doesn't matter, because the clone's life span
> is under 48 hours.
>
> TIA,
> Randall
>
>