On Mon, Nov 12 2018, Jeff King wrote: > In cases where we expect to ask has_sha1_file() about a lot of objects > that we are not likely to have (e.g., during fetch negotiation), we > already use OBJECT_INFO_QUICK to sacrifice accuracy (due to racing with > a simultaneous write or repack) for speed (we avoid re-scanning the pack > directory). > > However, even checking for loose objects can be expensive, as we will > stat() each one. On many systems this cost isn't too noticeable, but > stat() can be particularly slow on some operating systems, or due to > network filesystems. > > Since the QUICK flag already tells us that we're OK with a slightly > stale answer, we can use that as a cue to look in our in-memory cache of > each object directory. That basically trades an in-memory binary search > for a stat() call. > > Note that it is possible for this to actually be _slower_. We'll do a > full readdir() to fill the cache, so if you have a very large number of > loose objects and a very small number of lookups, that readdir() may end > up more expensive. > > This shouldn't be a big deal in practice. If you have a large number of > reachable loose objects, you'll already run into performance problems > (which you should remedy by repacking). You may have unreachable objects > which wouldn't otherwise impact performance. Usually these would go away > with the prune step of "git gc", but they may be held for up to 2 weeks > in the default configuration. > > So it comes down to how many such objects you might reasonably expect to > have, how much slower is readdir() on N entries versus M stat() calls > (and here we really care about the syscall backing readdir(), like > getdents() on Linux, but I'll just call this readdir() below). > > If N is much smaller than M (a typical packed repo), we know this is a > big win (few readdirs() followed by many uses of the resulting cache). > When N and M are similar in size, it's also a win. We care about the > latency of making a syscall, and readdir() should be giving us many > values in a single call. How many? > > On Linux, running "strace -e getdents ls" shows a 32k buffer getting 512 > entries per call (which is 64 bytes per entry; the name itself is 38 > bytes, plus there are some other fields). So we can imagine that this is > always a win as long as the number of loose objects in the repository is > a factor of 500 less than the number of lookups you make. It's hard to > auto-tune this because we don't generally know up front how many lookups > we're going to do. But it's unlikely for this to perform significantly > worse. > > Signed-off-by: Jeff King <peff@xxxxxxxx> > --- > There's some obvious hand-waving in the paragraphs above. I would love > it if somebody with an NFS system could do some before/after timings > with various numbers of loose objects, to get a sense of where the > breakeven point is. > > My gut is that we do not need the complexity of a cache-size limit, nor > of a config option to disable this. But it would be nice to have a real > number where "reasonable" ends and "pathological" begins. :) I'm happy to test this on some of the NFS we have locally, and started out with a plan to write some for-loop using the low-level API (so it would look up all 256), fake populate .git/objects/?? with N number of objects etc, but ran out of time. Do you have something ready that you think would be representative and I could just run? If not I'll try to pick this up again...