On Wed, Aug 14, 2013 at 07:04:37PM +0200, Stefan Beller wrote: > But apart from my blabbering, I think ivegy made a good point: > The C parts just don't rely on external things, but only libc and > kernel, so it may be nicer than a shell script. Also as it is used > serversided, the performance aspect is not negligible. > > I included Jeff King, who maybe could elaborate on git-repack on the > serverside? I don't think the performance of repack as a C program versus a shell script is really relevant to us at GitHub. Sure, we run a fair number of repacks, but the cost is totally dominated by the pack-objects process itself. You might be able to achieve some speedups if it was not simply a shell->C conversion, but an overall gc rewrite that did more in a single process, and reused results (for example, you can reuse all or part of the history traversal from pack-object's "counting objects" phase to do the reachability analysis during prune)[1]. But I'd be very wary of stuffing too many things in a single process. There are parts of the code that make assumptions about which objects have been seen in the global object hash table (I believe index-pack is one of these; see check_objects). And there are parts of the code which must run separately (e.g., the connectivity check after transfer runs in a separate process, both because it may die(), but also because we want a clean slate of which packs are available, with no caching of results we may have seen). None of those problems is unsolvable, but it's very hard to know when one is going to pop up and bite you. And because the repacking and pruning code is the most likely place for a bug to cause data loss, it makes me a bit nervous. -Peff [1] Another way to reuse the history traversal is to generate the much-discussed pack reachability bitmaps, and then use them in git-prune. -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html