On Sun, Feb 10, 2008 at 07:33:34PM -0500, Nicolas Pitre wrote: > On Sun, 10 Feb 2008, Junio C Hamano wrote: > > > mkoegler@xxxxxxxxxxxxxxxxx (Martin Koegler) writes: > > > > > This patch adds a cache to keep the object data in memory. The delta > > > resolving code must also search in the cache. > > > > I have to wonder what the memory pressure in real-life usage > > will be like. > FWIW, I don't like this idea. > > I'm struggling to find ways to improve performances of > pack-objects/index-pack with those large repositories that are becoming > more common (i.e. GCC, OOO, Mozilla, etc.) Anything that increase > memory usage isn't very welcome IMHO. Maybe I have missed something, but all repack problems reported on the git mailing list happen durring the deltifing phase. The problematic files are mostly bigger blobs. I'm aware of these problems, so my patch does not keep any blobs in memory. As we are talking about memory, let's ignore unpack-objects, which is used for small packs. Lets compare the memory usage of index-pack to pack-objects: If it is disabled (no --strict passed), only a (unused) pointer for each object in the received pack file is additionally allocated. On i386, struct object_entry is 84 bytes in pack-objects, but only 52 in index-pack. Both programs keep a struct object_entry for each object during the runtime in memory. So in this case, index-pack uses less memory than pack-objects If the --strict option is passed, more memory is used: * Again, we add one pointer to struct object_entry. object_entry is still smaller.(52<84 bytes). * index-pack allocates a struct blob/tree/commit/tag for each object in the pack. pack-objects also allocates only struct object in the best case (reading from pack file), otherwise a struct blob/tree/commit/tag. This objects are kept during the runtime of pack-objects in memory. So depending of the parameters of pack-objects, index-pack uses additionally up to 24 bytes per object, but struct object_entry is 32 bytes smaller. * index-pack allocates a struct blob/tree/commit/tag for each link to a object outside the pack. I don't know the code of pack-objects enough to say something to this point. * index-pack keeps the data for each tag/tree/commit in the pack in memory In the next version, I don't need to keep the tag/commit data in memory. Tree data could be reconstructed from the written pack, but I'm not sure, if the additional code (resolving deltas again), would justify the additional memory usage. So my conclusion is, that the memory usage of index-pack with --strict should not be too worse compared to pack-objects. Please remember, that --strict is used for pushing data. mfg Martin Kögler - To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html