Re: Git commit generation numbers

Phil Hord <hordp@xxxxxxxxx> · Wed, 20 Jul 2011 20:39:28 -0400

On 07/20/2011 08:18 PM, david@xxxxxxx wrote:
On Wed, 20 Jul 2011, Phil Hord wrote:

On 07/20/2011 07:36 PM, Nicolas Pitre wrote:
On Wed, 20 Jul 2011, david@xxxxxxx wrote:

If the generation number is part of the repository then it's going to
be the same for everyone.
The actual generation number will be, and has to be, the same for
everyone with the same repository content, regardless of the cache 
used.
It is a well defined number with no room to interpretation.

Nonsense.

Even if the generation number is well-defined and shared by all 
clients, the only quasi-essential definition is "for each A in 
ancestors_of(B), gen(A) < gen(B)".

In practice, the actual generation number *will be the same* for 
everyone with the same repository content, unless and until someone 
develops a different calculation method.  But there is no reason to 
require that the number *has to be* the same for everyone unless you 
expect (or require) everyone to share their gen-caches.

and I think this is why Linus is not happy with a cache. He is seeing 
this as something that has significantly more value if it is going to 
be consistant in a distributed manner than if it's just something 
calculated locally that can be different from other systems.

It will only be used locally, so it needn't be consistent with anyone 
else's.

if it's just locally generated, then I could easily see generation 
numbers being different on different people's ssstems, dependin on the 
order that they see commits (either locally generated or pulled from 
others)

If it's part of the commit, then as that commit gets propogated the 
generation number gets propogated as well, and every repository will 
agree on what the generation number is for any commit that's shared.

I agree that this consistancy guarantee seems to be valuable.

I can't see why.

Surely there will be a competent and efficient gen-cache API.  But 
most code can just ask if B --contains A or even just use rev-list 
and benefit from the increased speed of the answer.  Because most 
code doesn't really care about the gen numbers themselves, but only 
the speed of determining ancestry.

in that case, why bother with generation numbers at all? the improved 
data based heristic seems to solve that problem.

Does it?  Surely the ruckus would've died down in that case.  But I 
haven't been reading pu.

It seems to me that the main drawback to a gen-cache is that it slows 
down the first operation after even a local clone (with just hardlinks).

On the other hand, I see too many nails in the distributed-gen-numbers 
coffin:  legacy commits can't catch up (and therefore suffer), and 
legacy clients can trash or corrupt even "new-style" commits.

Phil

--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html