On Wed, Sep 05, 2007 at 08:50:04AM +0000, Steven Grimm wrote: > Pierre Habouzit wrote: > > Well independently from the fact that one could suppose that users > >should use gc on their own, the big nasty problem with repacking is that > >it's really slow. And I just can't imagine git that I use to commit > >blazingly fast, will then be unavailable for a very long time (repacks > >on my projects -- that are not as big as the kernel but still -- usually > >take more than 10 to 20 seconds each). > > > > What about kicking off a repack in the background at the ends of certain > commands? With an option to disable, of course. It could run at a low > priority and could even sleep a lot to avoid saturating the system's > disks -- since it'd be running asynchronously there should be no problem > if it takes longer to run. there is an issue with that: repack is memory and CPU intensive. Of course renicing the process deals with the CPU issue, but not with the memory one. I've often seen repacks eat more than 300 to 400Mo of memory on not so big repositories: it seems (and experience tells me that, not looking at the code) that if you have some big binary blobs (we have .swf's and .fla's in our repository) it can consume quite a lot of RAM to (presumably) compute efficient deltas. Sadly there is no way to "renice" the ram usage of a process. Once a repack is launched, it will make your system swap, and put the whole computer on its knees. > IMO expecting end users to regularly perform what are essentially > database administration tasks (running git-gc is akin to rebuilding > indexes or packing tables on a DBMS) is naive. Heck, even database > administrators don't like to run database administration commands; Well that's what crons are for. When you install a SGBD in a reasonable enough distro, it comes with the optimizing scripts in crons, launched at a reasonable period of the day (localtime). So the comparison doesn't hold. And that's exactly the problem: it's quite hard to ship git with an optimizing cron task, because we can't know where the user will keep his repositories, and when he works, so you have somehow to do it yourself. Or you can deal with that with a "rule". At work, we have our devel trees under $HOME/dev/, so the cron we use is just a (roughly): find $HOME/dev/ -name .git -type d -maxdepth 4 | while read repo do GIT_DIR="$repo" git gc done As we work on NFS, with a new developper, we can just setup the cron for him at a date where he's not supposed to be at work, and that's it. I'm not sure there is a good solution at all. Or we could also provide a: git-coffee-break command that would tell git: do whatever you want with this computer in the next 10 minutes, there won't be anyone watching, but I assume tea-lovers will feel excluded. -- ·O· Pierre Habouzit ··O madcoder@xxxxxxxxxx OOO http://www.madism.org
Attachment:
pgphFmHDChYmI.pgp
Description: PGP signature