Multiple threads of compression

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi,

I’m asking here informally first, because my information relates
to a quite old version (the one from lenny-backports). A tl;dr
is at the end.

On a multi-core machine, the garbage collection of git, as well
as pack compression on the server side when someone clones a
repository remotely, the compression is normally done automatically
using multiple threads of execution.

That may be fine for your typical setups, but in my cases, I have
two scenarios where it isn’t:

ⓐ The machine where I want it to use only, say, 2 of my 4 or 8 cores
  as I’m also running some VMs on the box which eat up a lot of CPU
  and which I don’t want to slow down.

ⓑ The server VM which has been given 2 or 3 VCPUs to cope with all
  the load done by clients, but which is RAM-constrained to only
  512 or, when lucky, 768 MiB. It previously served only http/https
  and *yuk* Subversion, but now, git comes into the play, and I’ve
  seen the one server box I think about go down *HARD* because git
  ate up all RAM *and* swap when someone wanted to update their clone
  of a repository after someone else committed… well, an ~100 MiB large
  binary file they shouldn’t. (It required manual intervention on the
  server to kill that revision and then the objects coupled with it,
  but even *that* didn’t work, read on for more.)

In both cases, I had to apply a quick hack. One I can reproduce
by now is, that, on the first box, I added a --threads=2 to the
line calling git pack-objects in /usr/lib/git-core/git-repack,
like this:

   83 args="$args $local ${GIT_QUIET:+-q} $no_reuse$extra"
   84 names=$(git pack-objects --threads=2 --keep-true-parents --honor-pack-
keep --non-empty --all --reflog $arg
   85         exit 1

(By the way, wrapping source code at 80c is still way to go IMHO.)

On the second box, IIRC I added --threads=1, but that box got
subsequently upgraded from lenny to wheezy so any local modification
is lost (luckily, the problem didn’t occur again recently, or at
least I didn’t notice it, save for the VM load going up to 6-8
several times a day).

tl;dr: I would like to have a *global* option for git to restrict
the number of threads of execution it uses. Several subcommands,
like pack-objects, are already equipped with an optioin for this,
but unfortunately, these are seldom invoked by hand¹, so this can’t
work in my situations.

① automatic garbage collection, “git gc --aggressive --prune=now”,
  and cloning are the use cases I have at hand right now.

À propos, while here: is gc --aggressive safe to run on a live,
online-shared repository, or does it break other users accessing
the repository concurrently? (If it’s safe I’d very much like to do
that in a, say weekly, cronjob on FusionForge, our hosting system.)

Thanks in advance!
//mirabilos

--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]