Re: timestamps not git-cloned

Thomas Rast <trast@xxxxxxxxxxxxxxx> · Sat, 29 Nov 2008 11:16:58 +0100

Stephen R. van den Berg wrote:
> Chris Frey wrote:
> >If this is the important bit, perhaps git-archive could be changed
> >to create tarballs with file timestamps based on their commit dates.
> 
> Based on the principle of least surprise, I'd consider this a rather good
> idea.

Unless I'm missing something, this would make git-archive rather more
expensive than it is now: Tree objects do not record any timestamps,
so figuring out the last commit that changed a file requires a full
history walk in the worst case[*].  (This is another side-effect of
not versioning files.)  On the other hand, current git-archive's
running time depends only on the size of the tree-ish given, including
all subtrees and blobs.

My unscientific guesstimates on how much work this would be, in a
random (old) linux-2.6 clone:

  $ git rev-parse HEAD
  e013e13bf605b9e6b702adffbe2853cfc60e7806
  $ time git ls-tree -r -t $(git rev-list HEAD~5000..HEAD) >/dev/null

  real    0m1.385s
  user    0m1.164s
  sys     0m0.220s
  $ git rev-list HEAD | wc -l
  117812

So reading (and dumping) all those trees and subtrees incurs a penalty
on the order of 30 seconds.  Compare to the current running time of
git-archive:

  $ time git archive --format=tar HEAD >/dev/null

  real    0m2.790s
  user    0m2.684s
  sys     0m0.072s

Of course, the ratio will keep getting worse as history gets longer.

- Thomas

[*] I think to really have a "worst case" here, you need at least one
file in every leaf directory that has not changed since the root
commit, and another that changes in every commit to force the search
to really read every subtree.

-- 
Thomas Rast
trast@{inf,student}.ethz.ch

Attachment:
signature.asc

Description: This is a digitally signed message part.