Re: [PATCH 0/4] Honor core.deltaBaseCacheLimit during index-pack

"Shawn O. Pearce" <spearce@xxxxxxxxxxx> · Tue, 15 Jul 2008 02:47:41 +0000

Nicolas Pitre <nico@xxxxxxx> wrote:
> 
> Those delta chains aren't simple chained lists.  They are trees of 
> deltas where one object might be the base for an unlimited number of 
> deltas of depth 1, and in turn each of those deltas might constitute the 
> base for an unlimited number of deltas of depth 2, and so on.
> 
> So what the code does is to find out which objects are not deltas but 
> are the base for a delta.  Then, for each of them, all deltas having 
> given object for base are found and they are recursively resolved so 
> each resolved delta is then considered a possible base for more deltas, 
> etc.  In other words, those deltas are resolved by walking the delta 
> tree in a "depth first" fashion.
> 
> If we discard previous delta bases, we will have to recreate them each 
> time a delta sibling is processed.  And if those delta bases are 
> themselves deltas then you have an explosion of delta results to 
> re-compute.

Yes, it would be horrible if we had to recompute 10 deltas in order
to recover a previously discarded delta base in order to visit new
siblings.

But its even more horrible that we use 512M of memory in our working
set size on a 256M machine to process a pack that is only 300M in
size, due to long delta chains on large objects.  In such a case
the system will swap and perform fairly poorly due to the huge disk
IO necessary to keep moving the working set around.

We're better off keeping our memory usage low and recomputing
the delta base when we need to return to it to process a sibling.

Please.  Remember that index-pack, unlike unpack-objects, does not
hold the unresolved deltas in memory while processing the input.
It assumes the total size of the unresolved deltas may exceed
the available memory for our working set and writes them to disk,
to be read back in later during the resolving phase.

At some point it is possible for the completely inflated delta
chain to exceed the physical memory of the system.  As soon as you
do that you are committed to some form of swapping.  We can probably
do that better from the packfile by reinflating the super compressed
deltas than letting the OS page in huge tracts of the virtual address
space off the swap device.  Plus the OS does not need to expend disk
IO to swap out the pages, we have already spent that cost when we
wrote the pack file down to disk as part of our normal operation.

I don't like adding code either.  But I think I'm right.  We really
need to not allow index-pack to create these massive working sets
and assume the operating system is going to be able to handle
it magically.  Memory is not infinite, even if the Turing machine
theory claims it is.

-- 
Shawn.
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html