On 29.06.2016 22:39, Junio C Hamano wrote:
Stefan Beller <sbeller@xxxxxxxxxx> writes:
On Wed, Jun 29, 2016 at 11:59 AM, Junio C Hamano <gitster@xxxxxxxxx> wrote:
On Wed, Jun 29, 2016 at 11:31 AM, Marc Strapetz
<marc.strapetz@xxxxxxxxxxx> wrote:
This is no RFE but rather recurring thoughts whenever I'm working with
commit graphs: a topological index attribute for commit objects would be
incredible useful. By "topological index" I mean a simple integer for which
following condition holds true:
Look for "generation numbers" in the list archive, perhaps?
Thanks for the pointer to the interesting discussions.
In http://www.spinics.net/lists/git/msg161363.html
Linus wrote in a discussion with Jeff:
Right now, we do *have* a "generation number". It's just that it's
very easy to corrupt even by mistake. It's called "committer date". We
could improve on it.
Would it make sense to refuse creating commits that have a commit date
prior to its parents commit date (except when the user gives a
`--dammit-I-know-I-break-a-wildy-used-heuristic`)?
I think that has also been discussed in the past. I do not think it
would help very much in practice, as projects already have up to 10
years (and the ones migrated from CVS, even more) worth of commits
they cannot rewrite that may record incorrect committer dates.
You'd need something like "you can trust committer dates that are
newer that this date" per project to switch between slow path and
fast path, with an updated fsck that knows how to compute that
number after you pulled from somebody who used that overriding
option.
If the use of generation number can somehow be limited narrowly, we
may be able to incrementally introduce it only for new commits, but
I haven't thought things through, so let me do so aloud here ;-)
Suppose we use it only for this purpose:
* When we have two commits, C1 and C2, with generation numbers G1
and G2, we can say "C1 cannot possibly be an ancestor of C2" if
G1 > G2. We cannot say anything else based on generation
numbers (or lack thereof).
then I think we could just say "A newly created commit must record
generation number G that is larger than generation numbers of its
parent commits; ignore parents that lack generation number for the
purpose of this sentence".
From algorithm perspective, for already existing repositories you would
still have to switch from an optimized generation number code to the
current commit-time based code. That could things make even more complex
and it's possibly expensive to determine whether a repository has full
generation number support or not.
On the other hand, for new repositories, you could immediately use
generation number based algorithms. So it could be "A newly created
commit must record generation number G that is larger than generation
numbers of its parent commits if all parents commits have a generation
number recorded; otherwise do not record a generation number". Something
like "git filter-branch" might already be sufficient to convert
repositories.
Git versions released in 2019 may start issuing warnings if HEAD has no
generation number assigned and Git versions released in 2025 may
completely refuse to work with such repositories.
In the interim period, a local cache as Jeff is proposing could serve as
secondary source for generation numbers. This would allow to phase out
current algorithms immediately.
-Marc
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html