Geoffrey Irving <irving@xxxxxxx> wrote: > On Mon, Jun 2, 2008 at 8:37 AM, Johannes Schindelin > > Another issue that just hit me: this cache is append-only, so if it grows > > too large, you have no other option than to scratch and recreate it. > > Maybe this needs porcelain support, too? (git gc?) > > If so, the correct operation is to go through the hash and remove > entries that refer to commits that no longer exist. I can add this if > you want. Hopefully somewhere along the way git-gc constructs an easy > to traverse list of extant commits, and this will be straightforward. git-gc doesn't make such a list. Down deep with git-pack-objects (which is called by git-repack, which is called by git-gc) yes, we do make the list of commits that we can find as reachable, and thus should stay in the repository. But that is really low-level plumbing. Wedging a SHA1->SHA1 hashmap gc task down into that is not a good idea. Instead you'll need to implement something that does `git rev-list --all -g` (or the internal equivilant) and then remove any entries in your hashmap that aren't in that result set. That's not going to be very cheap. Given how small entries are (what, 40 bytes?) I'd only want to bother with that collection process if the estimated potential wasted space was over 1M (26,000 entries) or some reasonable threshold like that. E.g. we could just set the GC for this to be once every 26,000 additions, and only during git-gc. Yea, you might waste about 1M worth of space before we clean up. Big deal, I'll bet you have more than that in loose unreachable objects laying around from git-rebase -i usage. ;-) -- Shawn. -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html