On Sun, 12 Aug 2007, David Kastrup wrote: > "Jon Smirl" <jonsmirl@xxxxxxxxx> writes: > > > If anyone is bored and looking for something to do, making the delta > > code in git repack multithreaded would help. > > I severely doubt that. It is like the "coding stuff in assembly > language will make it faster" myth. The problem is that of manageable > complexity. Making the stuff multithreaded or coded in assembly means > that it becomes inaccessible for a sound algorithmic redesign. I have to admit that I'm not a huge fan of threading: the complexity and locking often kills you, if memory bandwidth constraints do not, and the end result is often really really hard to debug. That said, I suspect we could some some *simple* form of this by just partitioning the problem space up - we could have a MT repack that generates four *different* packs on four different CPU's: each thread taking one quarter of the objects. At that point, you wouldn't even need threads, you could do it with regular processes, since the problem set is fully partitioned ocne you've generated the list of objects! Then, after you've generated four different packs, doing a "git gc" (without any threading) will repack them into one big pack, and mostly just re-use the existing deltas. So this would not be a generic thing, but it could be somethign that is useful for the forced full-repack after importing a large repository with fast-import, for example. So while I agree with David in general about the problem of threading, I think that we can possibly simplify the special case of repacking into something less complicated than a "real" multi-threading problem. Linus - To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html