Re: git annotate runs out of memory

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Wed, Dec 12, 2007 at 02:57:25 -0500, Jeff King wrote:
> On Tue, Dec 11, 2007 at 11:50:08AM -0800, Linus Torvalds wrote:
> > And, btw: the diff is totally different from the xdelta we have, so even 
> > if we have an already prepared nice xdelta between the two versions, we'll 
> > end up re-generating the files in full, and then do a diff on the end 
> > result.

The problem is whether git does not end-up re-generating the same file
multiple times. When it needs to construct the diff between two versions of
a file and one is delta-base (even indirect) of the other, does it know to
create the first, remember it, continue to the other and calculate the diff?

> > Of course, part of that is that git logically *never* works with deltas, 
> > except in the actual code-paths that generate objects (or generate packs, 
> > of course). So even if we had used a delta algorithm that would be 
> > amenable to be turned into a diff directly, it would have been a layering 
> > violation to actually do that.
> 
> That doesn't mean we can't opportunistically jump layers when available,
> and fall back on the regular behavior otherwise. The nice thing about
> clean and simple layers is that you can always add optimizations later
> by poking sane holes.
> 
> Let's assume for the sake of argument that we can convert an xdelta into
> a diff fairly cheaply.  Using the patch below, we can count the places
> where we are diffing two blobs, and one blob is a delta base of the
> other (assuming our magical conversion function can also reverse diffs.
> ;) ).
> 
> For a "git log -p" on git.git, I get:
> 
>    9951 diffs could be optimized
>   10958 diffs could not be optimized
> 
> or about 48%. It would be nice if we could drop the cost by almost 50%
> (if our magical function is free to call, too!).

This is actually a gross underestimation. The idea would be to know all the
diffs we need to calculate and than remember all useful results. Ie. if we
know we'll want objects A and C, A's delta base is B and B's delta base is C,
start calculating A and when it turns out to need C at some point, just
remember it for purpose of doing the final diff. On the other hand B can be
thrown away early (because we don't need it) to save memory.

Now git can know the list of deltas it will need in advance. First generate
the list of revisions -- nothing helps there, but their delta bases are
likely to be randomish anyway -- and than with the knowledge of full list of
trees, start doing the diffs to see which touched the subtree in question.
Repeat for each level.

Since the list of deltas that will be needed is known, the objects from
which all deltas were already generated can be expired from cache (but not
thrown away immediately, as they may help building other objects).

> Of course, I haven't even looked at whether converting xdeltas to
> unified diffs is possible. I suspect in some cases it is (e.g., pure
> addition of text) and in some cases it isn't (I assume xdelta doesn't
> have any context lines, which might hurt). And it's possible that a
> specialized diff user like git-blame can just learn to use the xdeltas
> by itself (I didn't get a "could optimize" count for git-blame since
> it seems to follow a different codepath for its diffs).

Well, it's about as hard as applying them, because you can remember the
necessary stuff when applying. The imporant bit would be to avoid applying
the same delta more than once during the whole annotate.

-- 
						 Jan 'Bulb' Hudec <bulb@xxxxxx>
-
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux