Nicolas Pitre <nico@xxxxxxx> wrote: > > Those delta chains aren't simple chained lists. They are trees of > deltas where one object might be the base for an unlimited number of > deltas of depth 1, and in turn each of those deltas might constitute the > base for an unlimited number of deltas of depth 2, and so on. > > So what the code does is to find out which objects are not deltas but > are the base for a delta. Then, for each of them, all deltas having > given object for base are found and they are recursively resolved so > each resolved delta is then considered a possible base for more deltas, > etc. In other words, those deltas are resolved by walking the delta > tree in a "depth first" fashion. > > If we discard previous delta bases, we will have to recreate them each > time a delta sibling is processed. And if those delta bases are > themselves deltas then you have an explosion of delta results to > re-compute. Yes, it would be horrible if we had to recompute 10 deltas in order to recover a previously discarded delta base in order to visit new siblings. But its even more horrible that we use 512M of memory in our working set size on a 256M machine to process a pack that is only 300M in size, due to long delta chains on large objects. In such a case the system will swap and perform fairly poorly due to the huge disk IO necessary to keep moving the working set around. We're better off keeping our memory usage low and recomputing the delta base when we need to return to it to process a sibling. Please. Remember that index-pack, unlike unpack-objects, does not hold the unresolved deltas in memory while processing the input. It assumes the total size of the unresolved deltas may exceed the available memory for our working set and writes them to disk, to be read back in later during the resolving phase. At some point it is possible for the completely inflated delta chain to exceed the physical memory of the system. As soon as you do that you are committed to some form of swapping. We can probably do that better from the packfile by reinflating the super compressed deltas than letting the OS page in huge tracts of the virtual address space off the swap device. Plus the OS does not need to expend disk IO to swap out the pages, we have already spent that cost when we wrote the pack file down to disk as part of our normal operation. I don't like adding code either. But I think I'm right. We really need to not allow index-pack to create these massive working sets and assume the operating system is going to be able to handle it magically. Memory is not infinite, even if the Turing machine theory claims it is. -- Shawn. -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html