Re: Finer timestamps and serialization in git

"Eric S. Raymond" <esr@xxxxxxxxxxx> · Wed, 15 May 2019 19:32:30 -0400

Derrick Stolee <stolee@xxxxxxxxx>:
> On 5/15/2019 3:16 PM, Eric S. Raymond wrote:
> > The deeper problem is that I want something from Git that I cannot
> > have with 1-second granularity. That is: a unique timestamp on each
> > commit in a repository.
> 
> This is impossible in a distributed version control system like Git
> (where the commits are immutable). No matter your precision, there is
> a chance that two machiens commit at the exact same moment on two different
> machines and then those commits are merged into the same branch.

It's easy to work around that problem. Each git daemon has to single-thread
its handling of incoming commits at some level, because you need a lock on the
file system to guarantee consistent updates to it.

So if a commit comes in that would be the same as the date of the
previous commit on the current branch, you bump the incoming commit timestamp.
That's the simple case. The complicated case is checking for date
collisions on *other* branches. But there are ways to make that fast,
too. There's a very obvious one involving a presort that is is O(log2
n) in the number of commits.

I wouldn't have brought this up in the first place if I didn't have a
pretty clear idea how to do it in code!

> Even when you specify a committer, there are many environments where a set
> of parallel machines are creating commits with the same identity.

If those commit sets become the same commit in the final graph, this is
not a problem for total ordering.

> > Why do I want this? There are number of reasons, all related to a
> > mathematical concept called "total ordering".  At present, commits in
> > a Git repository only have partial ordering. 
> 
> This is true of any directed acyclic graph. If you want a total ordering
> that is completely unambiguous, then you should think about maintaining
> a linear commit history by requiring rebasing instead of merging.

Excuse me, but your premise is incorrect.  A git DAG isn't just "any" DAG.
The presence of timestamps makes a total ordering possible.

(I was a theoretical mathematician in a former life. This is all very
familiar ground to me.)

> > One consequence is that
> > action stamps - the committer/date pairs I use as VCS-independent commit
> > identifications in reposurgeon - are not unique.  When a patch sequence
> > is applied, it can easily happen fast enough to give several successive
> > commits the same committer-ID and timestamp.
> 
> Sorting by committer/date pairs sounds like an unhelpful idea, as that
> does not take any graph topology into account. It happens that commits
> can actually have an _earlier_ commit date than its parent.

Yes, I'm aware of that.  The uniqueness properties that make a total
ordering desirable are not actually dependent on timestamp order
coinciding with topo order.

> Changing the granularity of timestamps requires changing the commit format,
> which is probably a non-starter.

That's why I started by noting that you're going to have to break the
format anyway to move to an ECDSA hash (or whatever you end up using).

I'm saying that *since you'll need to do that anyway*, it's a good time
to think about making timestamps finer-grained and unique.
-- 
		<a href="http://www.catb.org/~esr/";>Eric S. Raymond</a>