On Tue, Nov 07, 2023 at 11:21:29AM +0900, Junio C Hamano wrote: > > In git-rev-list(1), we describe the `--unpacked` option as: > > > > Only useful with `--objects`; print the object IDs that are not in > > packs. > > > > This is true of commits, which we discard via get_commit_action(), but > > not of the objects they reach. So if we ask for an --objects traversal > > with --unpacked, we may get arbitrarily many objects which are indeed > > packed. > > Strictly speaking, as long as all the objects that are not in packs > are shown, "print the object IDs that are not in packs" is satisfied. > With this fix, perhaps we would want to tighten the explanation a > little bit while we are at it. Perhaps > > print the object names but exclude those that are in packs > > or something along that line? I think using the word "exclude" is a good idea, as it makes it clear that we are omitting objects that otherwise would be traversed (as opposed to just showing unpacked objects, reachable or not). But I wanted to point out one other subtlety here. The existing code (before this patch) checks for already-packed commits, and avoids adding them to the traversal. The problem this patch is fixing is that we may see objects they point to via other non-packed commits. But the opposite problem exists, too: we have unpacked objects that are reachable from those packed commits. It's probably reasonably rare, since we _tend_ to make packs by rolling up reachable chunks of history. But that's not a guarantee. One way I can think of for it to happen in practice is that somebody pushes (or fetches) a thin pack with commit C as a delta against an unpacked C'. In that case "index-pack --fix-thin" will create a duplicate of C' in the new pack, but its trees and blobs may remain unpacked. I think with the patch in this series we could actually drop that "do not traverse commits that are unpacked" line of code, and end up "more correct". But I suspect performance of an incremental "git repack -d" would suffer. This is kind of analagous to the "we do not traverse every UNINTERESTING commit just to mark its trees/blobs as UNINTERESTING" optimization. We know that it is not a true set difference, but it is OK in practice and it buys us a lot of performance. And just like that case, bitmaps do let us cheaply compute the true set difference. ;) -Peff