Re: [PATCH 5/5] sha1_file: don't re-scan pack directory for null sha1

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Tue, Nov 21, 2017 at 02:20:19PM +0900, Junio C Hamano wrote:

> After queuing this series to an earlier part of 'pu' and resolving a
> conflict with jh/fsck-promisors topic, I ended up with a code that
> rejects 0{40} a lot earlier, before we try to see if a pack entry
> for 0{40} exists, even though the patch that is queued on this topic
> is more conservative (i.e. the above one).
> 
> Perhaps we would want to use the alternate version that declares the
> 0{40} is a sentinel that signals that there is no such object in
> this topic---that would give us a consistent semantics without
> having to adjust jh/fsck-promisors when it becomes ready to be
> merged.

I think we could adjust the patches without too much effort. That said,
I am very on-the-fence about whether to just declare the null sha1 as
"not a real object" and automatically return from the lookup without
doing any work. But if you agree that it's not likely to hurt anything,
it's IMHO a lot easier to reason about.

Here's a re-roll of patch 5 that behaves this way (the patch should be
unsurprising, but I've updated the commit message to match).

I did notice one other thing: the function looks up replace objects, so
we have both the original and the replaced sha1 available. My earlier
patch used the original sha1, but this one uses the replaced value.
I'm not sure if that's sane or not. It lets the fast-path kick in if you
point a replace ref at 0{40}. And it lets you point refs/replace/0{40}
to something else. I doubt that's useful, but it could perhaps be for
debugging, etc.

In most cases, of course, it wouldn't matter (since pointing to or from
the null sha1 is vaguely crazy in the first place).

-- >8 --
Subject: [PATCH v2 5/5] sha1_file: fast-path null sha1 as a missing object

In theory nobody should ever ask the low-level object code
for a null sha1. It's used as a sentinel for "no such
object" in lots of places, so leaking through to this level
is a sign that the higher-level code is not being careful
about its error-checking.  In practice, though, quite a few
code paths seem to rely on the null sha1 lookup failing as a
way to quietly propagate non-existence (e.g., by feeding it
to lookup_commit_reference_gently(), which then returns
NULL).

When this happens, we do two inefficient things:

  1. We actually search for the null sha1 in packs and in
     the loose object directory.

  2. When we fail to find it, we re-scan the pack directory
     in case a simultaneous repack happened to move it from
     loose to packed. This can be very expensive if you have
     a large number of packs.

Only the second one actually causes noticeable performance
problems, so we could treat them independently. But for the
sake of simplicity (both of code and of reasoning about it),
it makes sense to just declare that the null sha1 cannot be
a real on-disk object, and looking it up will always return
"no such object".

There's no real loss of functionality to do so Its use as a
sentinel value means that anybody who is unlucky enough to
hit the 2^-160th chance of generating an object with that
sha1 is already going to find the object largely unusable.

In an ideal world, we'd simply fix all of the callers to
notice the null sha1 and avoid passing it to us. But a
simple experiment to catch this with a BUG() shows that
there are a large number of code paths that do so.

So in the meantime, let's fix the performance problem by
taking a fast exit from the object lookup when we see a null
sha1. p5551 shows off the improvement (when a fetched ref is
new, the "old" sha1 is 0{40}, which ends up being passed for
fast-forward checks, the status table abbreviations, etc):

  Test            HEAD^             HEAD
  --------------------------------------------------------
  5551.4: fetch   5.51(5.03+0.48)   0.17(0.10+0.06) -96.9%

Signed-off-by: Jeff King <peff@xxxxxxxx>
---
 sha1_file.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/sha1_file.c b/sha1_file.c
index 8a7c6b7eba..ae4b71f445 100644
--- a/sha1_file.c
+++ b/sha1_file.c
@@ -1152,6 +1152,9 @@ int sha1_object_info_extended(const unsigned char *sha1, struct object_info *oi,
 				    lookup_replace_object(sha1) :
 				    sha1;
 
+	if (is_null_sha1(real))
+		return -1;
+
 	if (!oi)
 		oi = &blank_oi;
 
-- 
2.15.0.578.g35b419775f




[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux