Re: [PATCH v3 0/8] repack: support repacking into a geometric sequence

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Mon, Feb 22, 2021 at 08:06:16PM -0500, Taylor Blau wrote:

> > >     ++static int pack_mtime_cmp(const void *_a, const void *_b)
> > >     ++{
> > >     ++        struct packed_git *a = ((const struct string_list_item*)_a)->util;
> > >     ++        struct packed_git *b = ((const struct string_list_item*)_b)->util;
> > >     ++
> > >     ++        if (a->mtime < b->mtime)
> > >     ++                return -1;
> > >     ++        else if (b->mtime < a->mtime)
> > >     ++                return 1;
> > >     ++        else
> > >     ++                return 0;
> > >     ++}
> > >     ++
> >
> > Does that give us the packs in increasing chronological order, but then
> > decreasing chronological order within the packs themselves?
> 
> I agree we should be sorting and not blindly accepting the order that
> the caller gave us, but...
> 
> "chronological order within the packs themselves" confuses me. I think
> that you mean ordering objects within a pack by their offsets. If so,
> then yes: this gives you the oldest pack first (and all of its objects
> in their original order), then the second oldest (and all of its
> objects) and so on.
> 
> Could you clarify a bit how you'd expect to sort the objects in two
> packs?

Yes, by "within the packs themselves" I meant the physical order of
objects within an individual pack (sorted by their offsets, as we'd get
from for_each_object_in_pack). We would generally expect that to be
"newest first" within a given pack (modulo some other heuristics, but we
generally follow traversal order from rev-list).

So if the packs themselves are in oldest-first order, won't that create
a weird discontinuity at the pack boundaries?

E.g., imagine we have a linear sequence of commits A..Z in chronological
order, stored in two packs of equal size. Something like:

  tick=1234567890
  commit() {
    tick=$((tick+10))
    export GIT_COMMITTER_DATE="@$tick +0000"
    git commit --allow-empty -m $1
  }

  for i in $(perl -le 'print for A..M'); do commit $i; done
  git repack -d
  sleep 5
  for i in $(perl -le 'print for N..Z'); do commit $i; done
  git repack -d

Since "repack -d" will use a traversal to decide which objects to pack,
the two packs will have their commits in reverse chronological order:
M..A and Z..N. You can verify that with:

  for idx in $(ls -rt .git/objects/pack/*.idx); do
    stat --format='==> %y %n' $idx
    git show-index <$idx |
    sort -n |
    awk '{print $2}' |
    git --no-pager log --no-walk=unsorted --stdin --format=%s
  done

And if we then ran "git repack -ad" to make a new pack, it would be in
newest-to-oldest Z..A order.

But if instead we concatenate the packs after sorting them in
oldest-first order, we'll end up with a pack that contains M..A, then
Z..N. We instead want newest packs first (and then newest objects within
that pack, which is the pack order), then oldest.

In other words, I think your comparison function should be reversed
(return "1" when a->mtime < b->mtime).

(Of course these orders aren't perfect; in a real pack you'd have
non-commit objects, and we'd tweak the write order to keep delta
families together, etc. But our "best guess" should keep packs and
objects-within-packs consistent in newest-first order).

-Peff



[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux