Re: Finer timestamps and serialization in git

Michal Suchánek <msuchanek@xxxxxxx> · Mon, 20 May 2019 16:41:34 +0200

On Mon, 20 May 2019 10:14:17 -0400
"Eric S. Raymond" <esr@xxxxxxxxxxx> wrote:

> Jakub Narebski <jnareb@xxxxxxxxx>:
> > > What "commits that follow it?" By hypothesis, the incoming commit's
> > > timestamp is bumped (if it's bumped) when it's first added to a branch
> > > or branches, before there are following commits in the DAG.  
> > 
> > Errr... the main problem is with distributed nature of Git, i.e. when
> > two repositories create different commits with the same
> > committer+timestamp value.  You receive commits on fetch or push, and
> > you receive many commits at once.
> > 
> > Say you have two repositories, and the history looks like this:
> > 
> >  repo A:   1<---2<---a<---x<---c<---d      <- master
> > 
> >  repo B:   1<---2<---X<---3<---4           <- master
> > 
> > When you push from repo A to repo B, or fetch in repo B from repo A you
> > would get the following DAG of revisions
> > 
> >  repo B:   1<---2<---X<---3<---4           <- master
> >                  \
> >                   \--a<---x<---c<---d      <- repo_A/master
> > 
> > Now let's assume that commits X and x have the came committer and the
> > same fractional timestamp, while being different commits.  Then you
> > would need to bump timestamp of 'x', changing the commit.  This means
> > that 'c' needs to be rewritten too, and 'd' also:
> > 
> >  repo B:   1<---2<---X<---3<---4           <- master
> >                  \
> >                   \--a<---x'<--c'<--d'     <- repo_A/master  
> 
> Of course that's true.  But you were talking as though all those commits
> have to be modified *after they're in the DAG*, and that's not the case.
> If any timestamp has to be modified, it only has to happen *once*, at the
> time its commit enters the repo.

And that's where you get it wrong. Git is *distributed*. There is more
than one repository. Each repository has its own DAG that is completely
unrelated to the other repositories and their DAGs. So when you take
your history and push it to another repository and the timestamps
change as the result what ends up in the other repository is not the
history you pushed. So the repositories diverge and you no longer know
what is what.

> 
> Actually, in the normal case only x would need to be modified. The only
> way c would need to be modified is if bumping x's timestamp caused an
> actual collision with c's.
> 
> I don't see any conceptual problem with this.  You appear to me to be
> confusing two issues.  Yes, bumping timestamps would mean that all
> hashes downstream in the Merkle tree would be generated differently,
> even when there's no timestamp collision, but so what?  The hash of a
> commit isn't portable to begin with - it can't be, because AFAIK
> there's no guarantee that the ancestry parts of the DAG in two
> repositories where copies of it live contain all the same commits and
> topo relationships.

If you push form one repository to another repository now you get exact
same history with exact same hashes. So the hashes are portable across
repositories that share history. With your proposed change hashes can
be modified on push/pull so repositories no longer share history and
hashes become non-portable. That's why it is a bad idea.

The commits are currently identified by the hash so it must not change
during push/pull. Changing the identifier to something else (eg content
has without (some) metadata) might be useful to make the identifier
more stable but will bring other problems when you need two
different identifiers for the same content to include it in two
unrelated histories.

Thanks

Michal