Re: [PATCH] Split packs from git-repack should have descending timestamps

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 5/24/07, Junio C Hamano <junkio@xxxxxxx> wrote:
"Shawn O. Pearce" <spearce@xxxxxxxxxxx> writes:
> Dana How <danahow@xxxxxxxxx> wrote:
>>
>> If git-repack produces multiple split packs because
>> --max-pack-size was in effect,  the first pack written
>> should have the latest timestamp because:
>> (1) sha1_file.c:rearrange_packed_git() puts more recent
>>     pack files at the beginning of the search list;  and
>> (2) the most recent objects are written out first
>>     while packing.
>
> Ack.  Given our mtime based sorting routine, even without your
> recent patch to improve it, I think we definately want this type
> of behavior built into git-repack.sh.  Good follow-on to your
> --max-pack-size series.

Gee, I do not want to touch this, unless we can do something
about that sleep 2, even if you have & at the end (actually,
especially because you have that -- it makes me worried).

At the minimum, I think you do not have to restamp at all if the
result is a single pack (i.e. the usual case), like so:

case "$restamp" in
?*' '?*)
        # we have more than one.
        # for split packs,  the first created should have most recent timestamp
        for file in $restamp ; do touch $file; sleep 2; done &
        ;;
esac

Come to think of it, can't you do this "re-touching" business at
the end of pack-objects without sleeping?  You could keep track
of the names of the packs you produced, and if you have produced
5, like so:

        1
        2
        3
        4
        5

you would swap timestamp of #1 and #5, #2 and #4 using stat()
and utime(), and you are done.  Each of these huge packs would
take more than one second to write it out, but if that is not
the case, you could even start with timestamp of #5, subtract 1
and stamp #4, subtract 1 and stamp #3, ... You may end up using
timestamp from the past, but that would not be a problem.
OK,  this triggered the following argument which convinces me:
git-pack-objects really should guarantee the correct timestamp
order,  otherwise some other caller will have to repeat the stuff
I tried to put in git-repack.sh .  So I will resubmit following Junio's
suggestions.  This won't be for a few days.

Also,  if there are rules on allowable bash constructs
(POSIX only, no &, etc),  perhaps they should go in
SubmittingPatches near the new C99 comments?

And I am really hoping that the other "use object density in
reordering" patch would make this irrelevant.  You would have
commit and then the rest in the normal input object stream, and
recenty ordering done by git-pack-objects should keep commits
together early in the resulting split pack, and earlier parts
that have the commits would be hopefully denser.
I understand your point,  but for a "normal" yet extremely
large repository this may not be the case.  The "object density"
patch is designed so that the density component of the sort
key is extremely weak -- I think the timestamp is very revealing,
and should be followed in the absence of large variations
in object density.  Correcting the timestamps makes sure
that the timestamp order corresponds sensibly to recency order
when packs are split.  A sequence of user commands producing
packfiles results in sensible and usable timestamps;
i"d just like to make sure this is also true when packs are
split.

Anyway,  I'm not going to submit anything more about
timestamps or object density until I see reactions to both patches,
since they interact.
--
Dana L. How  danahow@xxxxxxxxx  +1 650 804 5991 cell
-
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux