Re: [PATCH 3/3] revision: insert unsorted, then sort in prepare_revision_walk()

Jeff King <peff@xxxxxxxx> · Mon, 2 Apr 2012 16:37:28 -0400

On Mon, Apr 02, 2012 at 09:51:21AM -0700, Shawn O. Pearce wrote:

> Probably. But we tend to hate caches in Git because they can get stale
> and need to be rebuilt, and are redundant with the base data. The
> mythical "pack v4" work was going to approach this problem by storing
> the commit timestamps uncompressed in a more machine friendly format.
> Unfortunately the work has been stalled for years.

I'd love for packv4 to exist, but even once it does, it comes with its
own complications for network transfer (since we will have to translate
to/from packv2 on the wire).

Has anyone looked seriously at a new index format that stores the
redundant information in a more easily accessible way? It would increase
our disk usage, but for something like linux-2.6, only by 10MB per
32-bit word. On most of my systems I would gladly spare some extra RAM
for the disk cache if it meant I could avoid inflating a bunch of
objects. And this could easily be made optional for systems that don't
want to make the tradeoff (if it's not there, you fall back to the
current procedure; we could even store the data in a separate file to
retain indexv2 compatibility).

So it's sort-of a cache, in that it's redundant with the actual data.
But staleness and writing issues are a lot simpler, since it only gets
updated when we index the pack (and the pack index in general is a
similar concept; we are "caching" the location of the object in the
packfile, rather than doing a linear search to look it up each time).

-Peff
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html