Re: Git should preserve modification times at least on request

Peter Backes <rtc@xxxxxxxxxxxxxxxxxxx> · Wed, 21 Feb 2018 23:14:20 +0100

On Wed, Feb 21, 2018 at 10:33:05PM +0100, Ævar Arnfjörð Bjarmason wrote:
> This sounds like a sensible job for a git import tool, i.e. import a
> target directory into git, and instead of 'git add'-ing the whole thing
> it would look at the mtimes, sort files by mtime, then add them in order
> and only commit those files that had the same mtime in the same commit
> (or within some boundary).

I think that this would be The Wrong Thing to do.

The commit time is just that: The time the commit was done. The commit 
is an atomic group of changes to a number of files that hopefully bring 
the tree from one usable state into the next.

The mtime, in contrast, tells us when a file was most recently modified.

It may well be that main.c was most recently modified yesterday, and 
feature.c was modified this morning, and that only both changes taken 
together make sense as a commit, despite the long time in between.

Even worse, it may be that feature A took a long time to implement, so 
we have huge gaps in between the mtimes, but feature B was quickly done 
after A was finished. Such an algorithm would probably split feature A 
incorrectly into several commits, and group the more recently changed 
files of feature A with those of feature B.

And if Feature A and Feature B were developed in parallel, things get 
completely messy.

> The advantage of doing this via such a tool is that you could tweak it
> to commit by any criteria you wanted, e.g. not mtime but ctime or even
> atime.

Maybe, but it would be rather useless to commit by ctime or atime. You 
do one grep -r and the atime is different. You do one chmod or chown 
and the ctime is different. Those timestamps are really only useful for 
very limited purposes.

That ctime exists seems reasonable, since it's only ever updated when 
the inode is written anyway.

atime, in contrast, was clearly one of the rather nonsensical 
innovations of UNIX: Do one write to the disk for each read from the 
disk. C'mon, really? It would have been a lot more reasonable to simply 
provide a generic way for tracing read() system calls instead; then 
userspace could decide what to do with that information and which of it 
is useful and should be kept and perhaps stored on disk. Now we have 
this ugly hack called relatime to deal with the problem.

> You'd get the same thing as you'd get if git's tree format would change
> to include mtimes (which isn't going to happen), but with a lot more
> flexibility.

Well, from basic logic, I don't see how a decision not to implement a 
feature could possibly increase flexility. The opposite seems to be the 
case.

Best wishes
Peter

-- 
Peter Backes, rtc@xxxxxxxxxxxxxxxxxxx