On 5/24/07, Junio C Hamano <junkio@xxxxxxx> wrote:
"Shawn O. Pearce" <spearce@xxxxxxxxxxx> writes: > Dana How <danahow@xxxxxxxxx> wrote: >> >> If git-repack produces multiple split packs because >> --max-pack-size was in effect, the first pack written >> should have the latest timestamp because: >> (1) sha1_file.c:rearrange_packed_git() puts more recent >> pack files at the beginning of the search list; and >> (2) the most recent objects are written out first >> while packing. > > Ack. Given our mtime based sorting routine, even without your > recent patch to improve it, I think we definately want this type > of behavior built into git-repack.sh. Good follow-on to your > --max-pack-size series. Gee, I do not want to touch this, unless we can do something about that sleep 2, even if you have & at the end (actually, especially because you have that -- it makes me worried). At the minimum, I think you do not have to restamp at all if the result is a single pack (i.e. the usual case), like so: case "$restamp" in ?*' '?*) # we have more than one. # for split packs, the first created should have most recent timestamp for file in $restamp ; do touch $file; sleep 2; done & ;; esac Come to think of it, can't you do this "re-touching" business at the end of pack-objects without sleeping? You could keep track of the names of the packs you produced, and if you have produced 5, like so: 1 2 3 4 5 you would swap timestamp of #1 and #5, #2 and #4 using stat() and utime(), and you are done. Each of these huge packs would take more than one second to write it out, but if that is not the case, you could even start with timestamp of #5, subtract 1 and stamp #4, subtract 1 and stamp #3, ... You may end up using timestamp from the past, but that would not be a problem.
OK, this triggered the following argument which convinces me: git-pack-objects really should guarantee the correct timestamp order, otherwise some other caller will have to repeat the stuff I tried to put in git-repack.sh . So I will resubmit following Junio's suggestions. This won't be for a few days. Also, if there are rules on allowable bash constructs (POSIX only, no &, etc), perhaps they should go in SubmittingPatches near the new C99 comments?
And I am really hoping that the other "use object density in reordering" patch would make this irrelevant. You would have commit and then the rest in the normal input object stream, and recenty ordering done by git-pack-objects should keep commits together early in the resulting split pack, and earlier parts that have the commits would be hopefully denser.
I understand your point, but for a "normal" yet extremely large repository this may not be the case. The "object density" patch is designed so that the density component of the sort key is extremely weak -- I think the timestamp is very revealing, and should be followed in the absence of large variations in object density. Correcting the timestamps makes sure that the timestamp order corresponds sensibly to recency order when packs are split. A sequence of user commands producing packfiles results in sensible and usable timestamps; i"d just like to make sure this is also true when packs are split. Anyway, I'm not going to submit anything more about timestamps or object density until I see reactions to both patches, since they interact. -- Dana L. How danahow@xxxxxxxxx +1 650 804 5991 cell - To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html