Re: [PATCH 5/5] sha1_file: don't re-scan pack directory for null sha1

Stefan Beller <sbeller@xxxxxxxxxx> · Mon, 20 Nov 2017 12:47:53 -0800

On Mon, Nov 20, 2017 at 12:35 PM, Jeff King <peff@xxxxxxxx> wrote:
> In theory nobody should ever ask the low-level object code
> for a null sha1. It's used as a sentinel for "no such
> object" in lots of places, so leaking through to this level
> is a sign that the higher-level code is not being careful
> about its error-checking.  In practice, though, quite a few
> code paths seem to rely on the null sha1 lookup failing as a
> way to quietly propagate non-existence (e.g., by feeding it
> to lookup_commit_reference_gently(), which then returns
> NULL).
>
> When this happens, we do two inefficient things:
>
>   1. We actually search for the null sha1 in packs and in
>      the loose object directory.
>
>   2. When we fail to find it, we re-scan the pack directory
>      in case a simultaneous repack happened to move it from
>      loose to packed.
>
> It's debatable whether (1) is a good idea or not. The
> performance drop is at least linear in the number of
> lookups, so it's not too bad. And if by some 2^-160th chance
> somebody does have such an object, we probably ought to
> access it. On the other hand, enough code paths treat the
> null sha1 specially that the object probably isn't usable,
> anyway.
>
> Problem (2), on the other hand, is a pretty severe
> performance issue. If you have a large number of packs,
> rescanning the pack directory can be very expensive. And it
> only helps in the case that you have an object with the null
> sha1 _and_ somebody was simultaneously repacking it. The
> tradeoff is certainly a bad one there.
>
> In an ideal world, we'd simply fix all of the callers to
> notice the null sha1 and avoid passing it to us. But a
> simple experiment to catch this with a BUG() shows that
> there are a large number of code paths.
>
> In the meantime, let's address problem (2). That's the
> minimal fix we can do to make the performance problems go
> away. p5551 shows this off (when a fetched ref is new, the
> "old" sha1 is 0{40}, which ends up being passed for
> fast-forward checks, the status table abbreviations, etc):
>
>   Test            HEAD^             HEAD
>   --------------------------------------------------------
>   5551.4: fetch   5.51(5.03+0.48)   0.17(0.10+0.06) -96.9%
>
> We could address (1), too, but there's not much performance
> improvement left to make.
>
> Signed-off-by: Jeff King <peff@xxxxxxxx>
> ---
> This is the minimal fix that addresses the performance issues.
> I'd actually have no problem at all declaring that looking up a null
> sha1 is insane, and having the object-lookup routines simply return "no
> such object" without even doing the loose/pack lookup first.
>
> I'm also fine with going down the BUG() route and fixing the
> higher-level callers, but it's a big enough task (with little enough
> real-world impact) that I think it would be worth applying this in the
> meantime.

It would have a lot of impact in the future, when new developers
are hindered mis-using the API. The (unwritten) series with introducing
BUG() would help a lot in 'holding it right' and I would expect fewer
performance
regressions over time.

The patch is impressively small for such a performance gain.
Personally I think (1) (which essentially means "making null sha1
work like a regular sha1") is quite an endeavor at this point in time
for this code base.

As a tangent, a similar but solved problem in the diff code is how
NUL in user data is treated in xdiff for example, as there we kept
being careful since the beginning (though I think we don't have tests
for it, so it might be broken)

Stefan