Re: [RFC PATCH] index-pack: improve performance on NFS

Jeff King <peff@xxxxxxxx> · Mon, 29 Oct 2018 18:27:42 -0400

On Mon, Oct 29, 2018 at 09:34:53PM +0000, Geert Jansen wrote:

> As an example, this means that when you're recieving a pack file with 1K
> objects in a repository with 10K loose objects that the loose-object-cache
> patch has roughly the same performance as the current git. I'm not sure if this
> is something to worry about as I'm not sure people run repos with this many
> loose files. If it is a concern, there could be a flag to turn the loose object
> cache on/off.

So yeah, that's the other thing I'm thinking about regarding having a
maximum loose cache size.

10k objects is only 200KB in memory. That's basically nothing. At some
point you run into pathological cases, like having a million objects
(but that's still only 20MB, much less than we devote to other caches,
though of course they do add up).

If you have a million loose objects, I strongly suspect you're going to
run into other problems (like space, since you're not getting any
deltas).

The one thing that gives me pause is that if you have a bunch of unused
and unreachable loose objects on disk, most operations won't actually
look at them at all. The majority of operations are only looking for
objects we expect to be present (e.g., resolving a ref, walking a tree)
and are fulfilled by checking the pack indices first.

So it's possible that Git is _tolerable_ for most operations with a
million loose objects, and we could make it slightly worse by loading
the cache. But I find it hard to get too worked up about spending an
extra 20MB (and the time to readdir() it in) in that case. It seems like
about 400ms on my machine, and the correct next step is almost always
going to be "pack" or "prune" anyway.

-Peff