Jeff King <peff@xxxxxxxx> writes: > On Thu, Aug 05, 2010 at 11:47:09AM -0700, Junio C Hamano wrote: > >> My gut feeling is that it is probably Ok if contains() and its >> recursive helper are moved to builtin/tag.c and are made static, to >> make it clear that this should not be reused outside the current >> context as a generic "contains" function. It would probably help to >> have a comment at the end of list_tags() to say that TMP_MARK _ought_ >> to be cleaned before leaving the function but we don't do that because >> we know it is the last function in the callchain before we exit. After thinking about this a bit more, I changed my mind. I think depth-first traversal from all tag tips, without running any traversal from the wanted commit, has a serious downside. What happens when you run "git tag --contains master" in a Linus tree after a major release but before he tags the -rc1 and closes the merge window? Doesn't the algorithm run all the way down to the root, only to say nothing? Admittedly the traversal will visit each commit once (starting from v2.6.12 down to v2.6.12-rc2, then skipping v2.6.12-rc2 through v2.6.12-rc6 because they are already seen and known not to contain the "master", then starting from v2.6.13 down to v2.6.12, and so on), but visiting all the commits down to root feels really wrong. I think the merge-base traversal is essential. Suppose we have hundreds of commits with two tags 'T1' and 'T2': o---T2 o---o---Y / / ---o---o---o---o---o---o---o---%---...---X---*---T1 If you are probing for 'X', traversing from all the tags until you hit a wanted commit will stop at a reasonable point (namely, 'X') if you are lucky and you started from 'T1' not 'T2', with your new algorithm. If you are unlucky and started from 'T2', you will however traverse down to root from 'T2' which would be the bulk of the history. Also, when you are probing for 'Y', you end up traversing all the way down, whether you start from 'T1' or 'T2', don't you? You can (and should) dig both from 'Y' and 'T1' and stop traversal at '*', as that commit is an ancestor of 'Y' and at that point you have proven that any ancestor of that you will find by traversing further will not find 'Y'. Similarly for 'Y' and 'T2' pair, which should stop at '%'. As the author of show-branch, I think I know one trick that may speed this up without compromising the worst case scenario. We can run this traversal in parallel for many tags at once. Use bit #0 of object.flags for the wanted commit, and bit #1..bit #N for N tags (if you have more tags than that fit in the flags word, you run the algorithm multiple times, N tags at a time, clearing the flags after each round). The parallel traversal will run, smudging the parent commit's flags with those of the child, in the usual way. Suppose that you are probing for 'Y'. You start traversal from 'Y', 'T1' and 'T2' (as two tags will fit comfortably in a single flags word). When you come to commit '%', you will notice that all the ancestors of that commit are contained in the wanted commit and all the tags you are probing for ('T1' and 'T2'). There is no point to go further than that point during this round, as you know you will not find 'Y' by digging further from that point. In general, when you find that a commit has both bit #0 and bit #i set, that commit is known to be a common ancestor between 'Y' and 'Ti', so there is no point going further from that point to see if 'Ti' contains 'Y'---you know it doesn't. So you can stop when you see bit 0..N are all set. After traversing as many commits as possible this way, look at the bits 1..i that is given to 'Y'. They represent the set of tags that contain 'Y' (you will find only bit #0 is set on 'Y'). Similarly, if you are probing for 'X', start the traversal from 'X', 'T1' and 'T2'. The traversal that begins at 'T1' will reach 'X' (so 'X' will initially start with bit #0 set, but then when the traversal that started from 'T1' reaches it, its bit #1 gets set) eventually. Because bit #2 is not set at that point, you don't stop and keep traversing. You may already have started marking 'X' and its parent with bit #0 with the traversal you started at 'X', but when you reach 'X' from 'T1' again, you will be changing bit #1; anytime you change the flag, you would push the commit back to the stack and re-traverse it. Eventually, you will smudge '%' with all 3 bits (depending on clock skew, you might traverse a few more levels of its parents) and you can stop. The flag word on 'X' will have bis #0 and #1 set, so you know 'T1' reaches it but 'T2' doesn't. To optimize this further, you may want to take advantage of the fact that tags are not supposed to change, and have a cache that knows what tag is contained in what other tag (e.g. if v1.0 is contained in v2.0, you do not have to run the probe for v2.0 once you know the wanted commit is reachable from v1.0). But that can come later, I think. But I haven't thought things really through. If I am lucky, I may be able to find time tomorrow during the day to try conjuring something out, but no promises... -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html