Re: [PATCH 1/3] revision: unify {tree,blob}_objects in rev_info

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Tue, Feb 28, 2017 at 01:42:44PM -0800, Junio C Hamano wrote:

> Jonathan Tan <jonathantanmy@xxxxxxxxxx> writes:
> 
> > It could be argued that in the future, Git might need to distinguish
> > tree_objects from blob_objects - in particular, a user might want
> > rev-list to print the trees but not the blobs. 
> 
> That was exactly why these bits were originally made to "appear
> independent but in practice nobody sets only one and leaves others
> off".  
> 
> And it didn't happen in the past 10 years, which tells us that we
> should take this patch.

I actually have a patch which uses the distinction. It's for
upload-archive doing reachability checks (which seems rather familiar to
what's going on here).

The whole series (from 2013!) is at:

  git://github.com/peff/git jk/archive-reachability

but the relevant commits are below.

I don't think the same logic holds for this case, though, because
somebody actually can ask for a single blob.

-- >8 --
From: Jeff King <peff@xxxxxxxx>
Date: Wed, 5 Jun 2013 17:57:02 -0400
Subject: [PATCH] list-objects: optimize "revs->blob_objects = 0" case

If we are traversing trees during a "--objects"
traversal, we may skip blobs if the "blob_objects" field of
rev_info is not set. But we do so as the first thing in
process_blob(), only after we have actually created the
"struct blob" object, incurring a hash lookup. We can
optimize out this no-op call completely.

This does not actually affect any current code, as all of
the current traversals always set blob_objects when looking
at objects, anyway.

Signed-off-by: Jeff King <peff@xxxxxxxx>
---
 list-objects.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/list-objects.c b/list-objects.c
index f3ca6aafb..58ad69557 100644
--- a/list-objects.c
+++ b/list-objects.c
@@ -117,7 +117,7 @@ static void process_tree(struct rev_info *revs,
 			process_gitlink(revs, entry.oid->hash,
 					show, base, entry.path,
 					cb_data);
-		else
+		else if (revs->blob_objects)
 			process_blob(revs,
 				     lookup_blob(entry.oid->hash),
 				     show, base, entry.path,
-- 
2.12.0.359.gd4c8c42e9

-- >8 --
From: Jeff King <peff@xxxxxxxx>
Date: Wed, 5 Jun 2013 18:02:42 -0400
Subject: [PATCH] archive: ignore blob objects when checking reachability

We cannot create an archive from a blob object, so we would
not expect anyone to provide one to us. And if they do, we
will fail anyway just after the reachability check.  We can
therefore optimize our reachability check to ignore blobs
completely, and not even create a "struct blob" for them.

Depending on the repository size and the exact place we find
the reachable object in the traversal, this can save 20-25%,
a we can avoid many lookups in the object hash.

The downside of this is that a blob provided to a remote
archive process will fail with "no such object" rather than
"object is not a tree" (we could organize the code to retain
the old message, but since we no longer know whether the
blob is reachable or not, we would potentially be leaking
information about the existence of unreachable objects).

Signed-off-by: Jeff King <peff@xxxxxxxx>
---
 archive.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/archive.c b/archive.c
index ef89b2556..489115f9f 100644
--- a/archive.c
+++ b/archive.c
@@ -383,6 +383,7 @@ static int object_is_reachable(const unsigned char *sha1)
 	save_commit_buffer = 0;
 	init_revisions(&data.revs, NULL);
 	setup_revisions(ARRAY_SIZE(argv) - 1, argv, &data.revs, NULL);
+	data.revs.blob_objects = 0;
 	if (prepare_revision_walk(&data.revs))
 		return 0;
 
-- 
2.12.0.359.gd4c8c42e9




[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]