Re: [PATCH 6/7] diff: cache textconv output

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Fri, Apr 02, 2010 at 12:23:06AM -0700, Junio C Hamano wrote:

> Jeff King <peff@xxxxxxxx> writes:
> 
> > Running a textconv filter can take a long time. It's
> > particularly bad for a large file which needs to be spooled
> > to disk, but even for small files, the fork+exec overhead
> > can add up for something like "git log -p".
> 
> Another reason that "log -p" gets benefit from caching is that you would
> typically end up running textconv twice on the same blob, once when you
> compare $commit:$path with $commit~1:$path, and again when you compare
> $commit~$n-1:$path with $commit~$n:$path (assuming that the $path didn't
> change between $commit~$n-1 and $commit~1).

Yep. I pointed out in one of my timing tests a slight slowdown in "git
show" when generating the cache. But for revision walking, it should
actually be faster, since you see each blob twice but cache after the
first time.

> It _might_ give you even better performance characteristics if you noice
> that you are walking history running many textconv, and cache the textconv
> result from the "older" (i.e. "one" side) tree in-core, until it is used
> in a "newer" (i.e. "two" side) tree, at which time you would evict it.

I doubt it is worth the effort. We are already caching the sha1 in-core
due to the notes mechanism. So we could really only save one object
retrieval. Which is already what a non-textconv diff will need to do, so
we should have performance on par with regular diffs at this point.

In fact, your optimization could be applied to all diff revision
walking, not just textconv, and you can halve the number of object
retrievals. The problem is that you may have blobs sitting in the
in-core cache as you walk many revisions, waiting for them to be changed
again. Depending on the locality of changes and the size of your
project, you won't be able to fit it all comfortably in memory, and will
end up discarding entries.

And all of that to save a few object retrievals, which are something
that git does very quickly already. Not to mention the ugly code that
would be involved.

-Peff
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]