A commit's generation is its height in the history graph, as measured from the farthest root. It is defined as: - If the commit has no parents, then its generation is 0. - Otherwise, its generation is 1 more than the maximum of its parents generations. The following diagram shows a sample history with generations: A(0)--B(1)--C(2)------G(5)--H(6) \ / D(2)--E(3)--F(4) Note that C and D have the same generation, as they are both children of B. Note also that the merge commit G's generation is 5, not 3, as we take the maximum of its parents' generations. Generation numbers can be useful for bounding traversals. For example, if we have two commits with generations 500 and 600, we know that the second cannot be an ancestor of the first. The first could be an ancestor of the second, but we can't know unless we traverse the history graph. However, when walking backwards from the "600" commit, once we reach generation "499", we know that the "500" commit cannot be an ancestor of the "499" commit, and we can stop the traversal without even looking at the earlier parts of the history. We already do something similar with commit timestamps in many traversals. However, timestamps are somewhat untrustworthy, as we have to deal with clock skew and with imports from buggy systems. Generation numbers are easy to calculate recursively, though you have to go to the roots to do so. This patch calculates and stores them in a persistent cache. It uses a simple recursive algorithm; you could probably drop the recursion by topologically sorting a list of all commits and filling in generation numbers left to right. But the recursive definition coupled with the cache make it very cheap to calculate generation numbers for new commits at the tip of history (you only have to traverse back to the last cached parents). We could also store generation numbers in the commit header directly. These would be faster to look at than an external cache (they would be on par speed-wise with commit timestamps). But there are a few reasons not to: 1. The reason to avoid commit timestamps is that they are unreliable. Generation numbers would probably be more reliable, but they are still redundant with the actual graph structure represented by the parent pointers, and can therefore be out of sync with the parent information. By calculating them ourselves, we know they are correct. 2. With grafts and replacement objects, the graph structure (and thus the generation numbers) can be changed. So the generation number, while immutable for a given commit object, can be changed when we "lie" about the graph structure via these mechanisms. Being able to simply clear the cache when these things are changed is helpful. 3. There's a lot of existing git history without generation headers. So we'd still need to have the same cache to handle those cases. 4. It generally pollutes the header with redundant information, which we try to avoid. Putting it in the commit header is purely a speedup, and it seems we can get similar performance with a generation cache. Signed-off-by: Jeff King <peff@xxxxxxxx> --- commit.c | 48 ++++++++++++++++++++++++++++++++++++++++++++++++ commit.h | 2 ++ 2 files changed, 50 insertions(+), 0 deletions(-) diff --git a/commit.c b/commit.c index ac337c7..d412e05 100644 --- a/commit.c +++ b/commit.c @@ -6,6 +6,7 @@ #include "diff.h" #include "revision.h" #include "notes.h" +#include "object-cache.h" int save_commit_buffer = 1; @@ -878,3 +879,50 @@ int commit_tree(const char *msg, unsigned char *tree, strbuf_release(&buffer); return result; } + +static struct object_cache generations; + +static unsigned long commit_generation_recurse(struct commit *c) +{ + struct commit_list *p; + uint32_t r; + + if (!object_cache_lookup_uint32(&generations, &c->object, &r)) + return r; + + if (parse_commit(c) < 0) + die("unable to parse commit: %s", sha1_to_hex(c->object.sha1)); + + if (!c->parents) + return 0; + + r = 0; + for (p = c->parents; p; p = p->next) { + unsigned long pgen = commit_generation_recurse(p->item); + if (pgen > r) + r = pgen; + } + r++; + + object_cache_add_uint32(&generations, &c->object, r); + return r; +} + +static void write_generation_cache(void) +{ + object_cache_write(&generations, "generations"); +} + +unsigned long commit_generation(const struct commit *commit) +{ + static int initialized; + + if (!initialized) { + object_cache_init(&generations, "generations", sizeof(uint32_t)); + atexit(write_generation_cache); + initialized = 1; + } + + /* drop const because we may call parse_commit */ + return commit_generation_recurse((struct commit *)commit); +} diff --git a/commit.h b/commit.h index a2d571b..bff6b36 100644 --- a/commit.h +++ b/commit.h @@ -176,4 +176,6 @@ extern int commit_tree(const char *msg, unsigned char *tree, struct commit_list *parents, unsigned char *ret, const char *author); +unsigned long commit_generation(const struct commit *commit); + #endif /* COMMIT_H */ -- 1.7.6.37.g989c6 -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html