Re: [PATCH/RFC 2/2] Make path-limiting be incremental when possible.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Linus Torvalds <torvalds@xxxxxxxx> writes:

> The reason I put "RFC" in the subject line is that while I've validated it 
> various ways, like doing
>
> 	git-rev-list HEAD -- drivers/char/ | md5sum 
>
> before-and-after on the kernel archive, it's "git-rev-list" after all. In 
> other words, it's that really really subtle and complex central piece of 
> software. So while I think this is important and should go in asap, I also 
> think it should get lots of testing and eyeballs looking at the code.

Let me make sure I understand what is going on.

Currently, limit_list(), which is called when revs->limited is
set, serves two different purposes:

 * traverse the ancestry chain and mark ancestors of
   uninteresting commits also uninteresting;

 * "simplify" each commit that is still interesting, by
   comparing the tree object of it and its parents.

We used to call limit_list() when we are limiting the commits by
paths, because that was what the latter does as a side effect of
add_parents_to_list().  You made it streamable by moving the
call to get_revision() and not calling limit_list() when you do
not have other reasons to call it.

You currently do not do this streaming optimization when you
have to show simplified parents, because the streaming version
traverses ancestry chain one step at a time as it goes, and you
cannot obviously see who the final "fake" parent is until you
simplify the parents enouogh.

Now, I have some observations, not necessarily limited to this
patch but also to the code from the "master" version.

 * get_commit_reference() sets "limited" when the user gives ^exclude
   commit.  no question about this --- we would need to see the
   reachability in this case.

 * when we are going to sort the result we are going to show,
   we set "limited" -- we cannot sort without knowing the set we
   are sorting.

 * giving commit timestamp limits via --max-age, --min-age
   etc. sets "limited".  I have doubts about this.  We currently
   look at the commit timestamp in limit_list() and mark a
   commit old enough to be "uninteresting" -- which makes
   ancestor of such commit also uninteresting.  Similarly
   commits that are too young are not pushed into the result.

   There is a filter in get_revision() to omit ineligible
   commits by time range already, so I wonder if we can remove
   that code from limit_list() and not set "limited" when these
   timestamp ranges are given.  This would let us stream even
   more.

   Admittably, ancestors of an old commit had better be even
   older, so marking them uninteresting to stop the traversal
   early is a good way to optimize the full-traversal case, but
   not having to call limit_list() would help streaming usage.

   Also I have doubts about correctness issue about the max-age
   handling.  Is it correct to mark a commit that is old enough
   to be uninteresting, implying that its ancestors are all
   uninteresting?  With clock skew among people, it is possible
   to make a merge commit that has parents one of which is "in
   the future".

 * As to handling of --unpacked, exactly the same comment as
   "max-age" applies, but it might be even worse.  Marking an
   unpacked one "uninteresting" means if a packed commit refers
   to loose commit, the loose one will be also marked
   uninteresting, I think.

"--objects --unpacked" is an idiom to repack objects that are
still loose, but I suspect it would do interesting thing if the
commit is in a pack but its trees and blobs are still loose.  Am
I confused, or have I just spotted a potential (but hard to
trigger) bug?


--
diff --git a/revision.c b/revision.c
index 0e3f074..a55c0d1 100644
--- a/revision.c
+++ b/revision.c
@@ -404,10 +404,6 @@ static void limit_list(struct rev_info *
 		list = list->next;
 		free(entry);
 
-		if (revs->max_age != -1 && (commit->date < revs->max_age))
-			obj->flags |= UNINTERESTING;
-		if (revs->unpacked && has_sha1_pack(obj->sha1))
-			obj->flags |= UNINTERESTING;
 		add_parents_to_list(revs, commit, &list);
 		if (obj->flags & UNINTERESTING) {
 			mark_parents_uninteresting(commit);
@@ -415,8 +411,6 @@ static void limit_list(struct rev_info *
 				break;
 			continue;
 		}
-		if (revs->min_age != -1 && (commit->date > revs->min_age))
-			continue;
 		p = &commit_list_insert(commit, p)->next;
 	}
 	if (revs->boundary) {
@@ -543,32 +537,26 @@ int setup_revisions(int argc, const char
 			}
 			if (!strncmp(arg, "--max-age=", 10)) {
 				revs->max_age = atoi(arg + 10);
-				revs->limited = 1;
 				continue;
 			}
 			if (!strncmp(arg, "--min-age=", 10)) {
 				revs->min_age = atoi(arg + 10);
-				revs->limited = 1;
 				continue;
 			}
 			if (!strncmp(arg, "--since=", 8)) {
 				revs->max_age = approxidate(arg + 8);
-				revs->limited = 1;
 				continue;
 			}
 			if (!strncmp(arg, "--after=", 8)) {
 				revs->max_age = approxidate(arg + 8);
-				revs->limited = 1;
 				continue;
 			}
 			if (!strncmp(arg, "--before=", 9)) {
 				revs->min_age = approxidate(arg + 9);
-				revs->limited = 1;
 				continue;
 			}
 			if (!strncmp(arg, "--until=", 8)) {
 				revs->min_age = approxidate(arg + 8);
-				revs->limited = 1;
 				continue;
 			}
 			if (!strcmp(arg, "--all")) {
@@ -635,7 +623,6 @@ int setup_revisions(int argc, const char
 			}
 			if (!strcmp(arg, "--unpacked")) {
 				revs->unpacked = 1;
-				revs->limited = 1;
 				continue;
 			}
 			*unrecognized++ = arg;
@@ -786,7 +773,9 @@ struct commit *get_revision(struct rev_i
 		if (revs->min_age != -1 && (commit->date > revs->min_age))
 			continue;
 		if (revs->max_age != -1 && (commit->date < revs->max_age))
-			return NULL;
+			continue;
+		if (revs->unpacked && has_sha1_pack(commit->object.sha1))
+			continue;
 		if (revs->no_merges &&
 		    commit->parents && commit->parents->next)
 			continue;

-
: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]