1. Jonathan N: Git has a flexible packfile format. Compared to CVS where
things are stored as deltas against the next revision of the same file.
GC can be a huge operation if it’s not done regularly. "git gc" makes
one huge pack. Better amortized behavior to have multiple packs with
exponentially increasing size and combine them when needed (Martin
Dick's exproll).
2. Jonathan N: There are also unreachable objects to take care about. GC
can/should delete them. But at the same time someone else might be
creating history that still needs those objects. To give objects a grace
period, we turn the unused objects into loose objects and look at the
creation time. But alternatively there’s the proposal to move these
unreachable objects into a packfile for all these objects. But this can
be a problem for older git clients, because they might not know the pack
is garbage and might move objects across packs. See the hash function
transition doc for details.
3. Terry: JGit has these unreachable garbage packs
4. Peff: You want to solve this loose objects explosion problem?
5. Peff: what if you reference an object in the garbage pack from an
object in a non-garbage pack?
6. Jonathan N: At GC time the object from the garbage pack is copied to
a non-garbage pack. Basically rescue it from the garbage. It only saves
the referenced objects, not the whole garbage pack.
7. Jonathan N: It has been running in production for >2 years.
8. Peff: There are so many non-atomic operations that can happen. And
races can happen.
9. Jonathan N: If you find races, please comment on the JGit change that
describes the algorithm. Happens-before relation and grace period.
10. `git gc --prune-now` should no longer create loose objects first,
before just deleting them.