On Mon, Feb 22, 2021 at 08:06:16PM -0500, Taylor Blau wrote: > > > ++static int pack_mtime_cmp(const void *_a, const void *_b) > > > ++{ > > > ++ struct packed_git *a = ((const struct string_list_item*)_a)->util; > > > ++ struct packed_git *b = ((const struct string_list_item*)_b)->util; > > > ++ > > > ++ if (a->mtime < b->mtime) > > > ++ return -1; > > > ++ else if (b->mtime < a->mtime) > > > ++ return 1; > > > ++ else > > > ++ return 0; > > > ++} > > > ++ > > > > Does that give us the packs in increasing chronological order, but then > > decreasing chronological order within the packs themselves? > > I agree we should be sorting and not blindly accepting the order that > the caller gave us, but... > > "chronological order within the packs themselves" confuses me. I think > that you mean ordering objects within a pack by their offsets. If so, > then yes: this gives you the oldest pack first (and all of its objects > in their original order), then the second oldest (and all of its > objects) and so on. > > Could you clarify a bit how you'd expect to sort the objects in two > packs? Yes, by "within the packs themselves" I meant the physical order of objects within an individual pack (sorted by their offsets, as we'd get from for_each_object_in_pack). We would generally expect that to be "newest first" within a given pack (modulo some other heuristics, but we generally follow traversal order from rev-list). So if the packs themselves are in oldest-first order, won't that create a weird discontinuity at the pack boundaries? E.g., imagine we have a linear sequence of commits A..Z in chronological order, stored in two packs of equal size. Something like: tick=1234567890 commit() { tick=$((tick+10)) export GIT_COMMITTER_DATE="@$tick +0000" git commit --allow-empty -m $1 } for i in $(perl -le 'print for A..M'); do commit $i; done git repack -d sleep 5 for i in $(perl -le 'print for N..Z'); do commit $i; done git repack -d Since "repack -d" will use a traversal to decide which objects to pack, the two packs will have their commits in reverse chronological order: M..A and Z..N. You can verify that with: for idx in $(ls -rt .git/objects/pack/*.idx); do stat --format='==> %y %n' $idx git show-index <$idx | sort -n | awk '{print $2}' | git --no-pager log --no-walk=unsorted --stdin --format=%s done And if we then ran "git repack -ad" to make a new pack, it would be in newest-to-oldest Z..A order. But if instead we concatenate the packs after sorting them in oldest-first order, we'll end up with a pack that contains M..A, then Z..N. We instead want newest packs first (and then newest objects within that pack, which is the pack order), then oldest. In other words, I think your comparison function should be reversed (return "1" when a->mtime < b->mtime). (Of course these orders aren't perfect; in a real pack you'd have non-commit objects, and we'd tweak the write order to keep delta families together, etc. But our "best guess" should keep packs and objects-within-packs consistent in newest-first order). -Peff