Re: [PATCH 1/2] list-objects: drop --unpacked non-commit objects from results

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Tue, Nov 07, 2023 at 11:21:29AM +0900, Junio C Hamano wrote:

> > In git-rev-list(1), we describe the `--unpacked` option as:
> >
> >     Only useful with `--objects`; print the object IDs that are not in
> >     packs.
> >
> > This is true of commits, which we discard via get_commit_action(), but
> > not of the objects they reach. So if we ask for an --objects traversal
> > with --unpacked, we may get arbitrarily many objects which are indeed
> > packed.
> 
> Strictly speaking, as long as all the objects that are not in packs
> are shown, "print the object IDs that are not in packs" is satisfied.
> With this fix, perhaps we would want to tighten the explanation a
> little bit while we are at it.  Perhaps
> 
> 	print the object names but exclude those that are in packs
> 
> or something along that line?

I think using the word "exclude" is a good idea, as it makes it clear
that we are omitting objects that otherwise would be traversed (as
opposed to just showing unpacked objects, reachable or not).

But I wanted to point out one other subtlety here. The existing code
(before this patch) checks for already-packed commits, and avoids adding
them to the traversal. The problem this patch is fixing is that we may
see objects they point to via other non-packed commits. But the opposite
problem exists, too: we have unpacked objects that are reachable from
those packed commits.

It's probably reasonably rare, since we _tend_ to make packs by rolling
up reachable chunks of history. But that's not a guarantee. One way I
can think of for it to happen in practice is that somebody pushes (or
fetches) a thin pack with commit C as a delta against an unpacked C'. In
that case "index-pack --fix-thin" will create a duplicate of C' in the
new pack, but its trees and blobs may remain unpacked.

I think with the patch in this series we could actually drop that "do
not traverse commits that are unpacked" line of code, and end up "more
correct". But I suspect performance of an incremental "git repack -d"
would suffer. This is kind of analagous to the "we do not traverse every
UNINTERESTING commit just to mark its trees/blobs as UNINTERESTING"
optimization. We know that it is not a true set difference, but it is OK
in practice and it buys us a lot of performance. And just like that
case, bitmaps do let us cheaply compute the true set difference. ;)

-Peff




[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux