Re: People unaware of the importance of "git gc"?

Pierre Habouzit <madcoder@xxxxxxxxxx> · Wed, 05 Sep 2007 11:14:00 +0200

On Wed, Sep 05, 2007 at 08:50:04AM +0000, Steven Grimm wrote:
> Pierre Habouzit wrote:
> >  Well independently from the fact that one could suppose that users
> >should use gc on their own, the big nasty problem with repacking is that
> >it's really slow. And I just can't imagine git that I use to commit
> >blazingly fast, will then be unavailable for a very long time (repacks
> >on my projects -- that are not as big as the kernel but still -- usually
> >take more than 10 to 20 seconds each).
> >  
> 
> What about kicking off a repack in the background at the ends of certain 
> commands? With an option to disable, of course. It could run at a low 
> priority and could even sleep a lot to avoid saturating the system's 
> disks -- since it'd be running asynchronously there should be no problem 
> if it takes longer to run.

  there is an issue with that: repack is memory and CPU intensive. Of
course renicing the process deals with the CPU issue, but not with the
memory one. I've often seen repacks eat more than 300 to 400Mo of memory
on not so big repositories: it seems (and experience tells me that, not
looking at the code) that if you have some big binary blobs (we have
.swf's and .fla's in our repository) it can consume quite a lot of RAM
to (presumably) compute efficient deltas.

  Sadly there is no way to "renice" the ram usage of a process. Once a
repack is launched, it will make your system swap, and put the whole
computer on its knees.

> IMO expecting end users to regularly perform what are essentially 
> database administration tasks (running git-gc is akin to rebuilding 
> indexes or packing tables on a DBMS) is naive. Heck, even database 
> administrators don't like to run database administration commands; 

  Well that's what crons are for. When you install a SGBD in a
reasonable enough distro, it comes with the optimizing scripts in crons,
launched at a reasonable period of the day (localtime). So the
comparison doesn't hold. And that's exactly the problem: it's quite hard
to ship git with an optimizing cron task, because we can't know where
the user will keep his repositories, and when he works, so you have
somehow to do it yourself.

  Or you can deal with that with a "rule". At work, we have our devel
trees under $HOME/dev/, so the cron we use is just a (roughly):

    find $HOME/dev/ -name .git -type d -maxdepth 4 | while read repo
    do
        GIT_DIR="$repo" git gc
    done

  As we work on NFS, with a new developper, we can just setup the cron
for him at a date where he's not supposed to be at work, and that's it.
I'm not sure there is a good solution at all.

  Or we could also provide a: git-coffee-break command that would tell
git: do whatever you want with this computer in the next 10 minutes,
there won't be anyone watching, but I assume tea-lovers will feel
excluded.

-- 
·O·  Pierre Habouzit
··O                                                madcoder@xxxxxxxxxx
OOO                                                http://www.madism.org
Attachment:
pgphFmHDChYmI.pgp

Description: PGP signature