On Fri, Dec 01, 2017 at 12:49:56PM -0500, Derrick Stolee wrote: > Replace use of strbuf_addf() with strbuf_add() when enumerating > loose objects in for_each_file_in_obj_subdir(). Since we already > check the length and hex-values of the string before consuming > the path, we can prevent extra computation by using the lower- > level method. Makes sense. It's a shame that "addf" is turning out to be so slow (I'm still mildly curious if that differs between Windows and glibc, but there are so many other variables between the two platforms it's hard to test). > One consumer of for_each_file_in_obj_subdir() is the abbreviation > code. OID abbreviations use a cached list of loose objects (per > object subdirectory) to make repeated queries fast, but there is > significant cache load time when there are many loose objects. > > Most repositories do not have many loose objects before repacking, > but in the GVFS case the repos can grow to have millions of loose > objects. Profiling 'git log' performance in GitForWindows on a > GVFS-enabled repo with ~2.5 million loose objects revealed 12% of > the CPU time was spent in strbuf_addf(). Yeah, we haven't heavily micro-optimized the case for having lots of loose objects. The general philosophy about having lots of loose objects is: don't. You're generally going to pay a system call for each access, which is much heavier-weight than packfiles. I think abbreviation-finding is the exception there, though, because we literally just readdir() the entries and never do anything else with them. > Add a new performance test to p4211-line-log.sh that is more > sensitive to this cache-loading. By limiting to 1000 commits, we > more closely resemble user wait time when reading history into a > pager. > > For a copy of the Linux repo with two ~512 MB packfiles and ~572K > loose objects, running 'git log --oneline --raw --parents -1000' > had the following performance: > > HEAD~1 HEAD > ---------------------------------------- > 7.70(7.15+0.54) 7.44(7.09+0.29) -3.4% Thanks for including numbers. I think the setup here highlights a weakness of the perf suite that we've discussed: if there's a performance regression for your case, nobody is likely to notice because they won't test on a repo with 500k loose objects. Probably "repo with a bunch of loose objects" ought to be a stock repository profile for doing regression runs. > diff --git a/sha1_file.c b/sha1_file.c > index 8ae6cb6285..2160323c4a 100644 This overall looks good, but I noticed one bug and a few cosmetic improvements. > --- a/sha1_file.c > +++ b/sha1_file.c > @@ -1914,17 +1914,18 @@ int for_each_file_in_obj_subdir(unsigned int subdir_nr, > } > > oid.hash[0] = subdir_nr; > + strbuf_add(path, "/", 1); Maybe: strbuf_addch(path, '/'); would be a bit more readable (it's also a tiny bit faster, but this isn't part of the tight loop). > + baselen = path->len; We set this here so that the '/' is included as part of the base. Makes sense, but can we now drop the earlier setting of baselen before the opendir() call? > while ((de = readdir(dir))) { > if (is_dot_or_dotdot(de->d_name)) > continue; > > - strbuf_setlen(path, baselen); > - strbuf_addf(path, "/%s", de->d_name); > - > if (strlen(de->d_name) == GIT_SHA1_HEXSZ - 2 && > !hex_to_bytes(oid.hash + 1, de->d_name, > GIT_SHA1_RAWSZ - 1)) { > + strbuf_setlen(path, baselen); > + strbuf_add(path, de->d_name, GIT_SHA1_HEXSZ - 2); Moving this code into the conditional makes sense here, since that's where we know the actual size. But what about the case when we _don't_ hit this conditional. We still do: if (cruft_cb) cruft_cb(path->buf); which is now broken (it will just get the directory name without the entry appended). I think the optimized versions probably needs to be something more like: while (de = readdir(...)) { strbuf_setlen(path, baselen); if (size is HEXSZ - 2) { strbuf_add(path, de->d_name, HEXSZ - 2); obj_cb(path->buf); } else if (cruft_cb) { strbuf_addstr(path, de->d_name); cruft_cb(path->buf); } } Two other thoughts: - from past discussions, I suspect most of your performance improvement actually comes from avoiding addf(), and the difference between addstr() and add() may be negligible here. It might be worth timing strbuf_addstr(). If that's similarly fast, that means we could keep the logic out of the conditional entirely. - there's an extra micro-optimization there, which is that if there's no obj_cb, we have no need to assemble the full path at all. I doubt it makes much of a difference, as most callers would pass an object callback (I'd be surprised if we even have one that doesn't). -Peff