Re: Reducing CPU load on git server

Ævar Arnfjörð Bjarmason <avarab@xxxxxxxxx> · Mon, 29 Aug 2016 22:14:44 +0200

On Sun, Aug 28, 2016 at 9:42 PM, W. David Jarvis
<william.d.jarvis@xxxxxxxxx> wrote:
> I've run into a problem that I'm looking for some help with. Let me
> describe the situation, and then some thoughts.

Just a few points that you may not have considered, and I didn't see
mentioned in this thread:

 * Consider having that queue of yours just send the pushed payload
instead of "pull this", see git-bundle. This can turn this sync entire
thing into a static file distribution problem.

 * It's not clear from your post why you have to worry about all these
branches, surely your Chef instances just need the "master" branch,
just push that around.

 * If you do need branches consider archiving stale tags/branches
after some time. I implemented this where I work, we just have a
$REPO-archive.git with every tag/branch ever created for a given
$REPO.git, and delete refs after a certain time.

 * If your problem is that you're CPU bound on the master have you
considered maybe solving this with something like NFS, i.e. replace
your ad-hoc replication with just a bunch of "slave" boxes that mount
the remote filesystem.

 * Or, if you're willing to deal with occasional transitory repo
corruption (just retry?): rsync.

 * Theres's no reason for why your replication chain needs to be
single-level if master CPU is really the issue. You could have master
-> N slaves -> N*X slaves, or some combination thereof.

 * Does it really even matter that your "slave" machines are all
up-to-date? We have something similar at work but it's just a minutely
cronjob that does "git fetch" on some repos, since the downstream
thing (e.g. the chef run) doesn't run more than once every 30m or
whatever anyway.