Since we're cutting corners to speed things up, could you try something like this? I notice that reading v4 is significantly slower than v2 and apparently strlen() (at least from glibc) is much cleverer and at least gives me a few percentage time saving. diff --git a/read-cache.c b/read-cache.c index 7b1354d759..d10cccaed0 100644 --- a/read-cache.c +++ b/read-cache.c @@ -1755,8 +1755,7 @@ static unsigned long expand_name_field(struct strbuf *name, const char *cp_) if (name->len < len) die("malformed name field in the index"); strbuf_remove(name, name->len - len, len); - for (ep = cp; *ep; ep++) - ; /* find the end */ + ep = cp + strlen(cp); strbuf_add(name, cp, ep - cp); return (const char *)ep + 1 - cp_; } On Thu, Aug 23, 2018 at 10:36 PM Ben Peart <peartben@xxxxxxxxx> wrote: > > Nice to see this done without a new index extension that records > > offsets, so that we can load existing index files in parallel. > > > > Yes, I prefer this simpler model as well. I wasn't sure it would > produce a significant improvement given the primary thread still has to > run through the variable length cache entries but was pleasantly surprised. Out of curiosity, how much time saving could we gain by recording offsets as an extension (I assume we need, like 4 offsets if the system has 4 cores)? Much much more than this simpler model (which may justify the complexity) or just "meh" compared to this? -- Duy