Re: [PATCH] Adding a cache of commit to patch-id pairs to speed up git-cherry

"Shawn O. Pearce" <spearce@xxxxxxxxxxx> · Mon, 2 Jun 2008 11:56:44 -0400

Geoffrey Irving <irving@xxxxxxx> wrote:
> On Mon, Jun 2, 2008 at 8:37 AM, Johannes Schindelin
> > Another issue that just hit me: this cache is append-only, so if it grows
> > too large, you have no other option than to scratch and recreate it.
> > Maybe this needs porcelain support, too?  (git gc?)
> 
> If so, the correct operation is to go through the hash and remove
> entries that refer to commits that no longer exist.  I can add this if
> you want.  Hopefully somewhere along the way git-gc constructs an easy
> to traverse list of extant commits, and this will be straightforward.

git-gc doesn't make such a list.  Down deep with git-pack-objects
(which is called by git-repack, which is called by git-gc) yes,
we do make the list of commits that we can find as reachable, and
thus should stay in the repository.  But that is really low-level
plumbing.  Wedging a SHA1->SHA1 hashmap gc task down into that is
not a good idea.

Instead you'll need to implement something that does `git rev-list
--all -g` (or the internal equivilant) and then remove any entries
in your hashmap that aren't in that result set.  That's not going
to be very cheap.

Given how small entries are (what, 40 bytes?) I'd only want to bother
with that collection process if the estimated potential wasted space
was over 1M (26,000 entries) or some reasonable threshold like that.

E.g. we could just set the GC for this to be once every 26,000
additions, and only during git-gc.  Yea, you might waste about 1M
worth of space before we clean up.  Big deal, I'll bet you have
more than that in loose unreachable objects laying around from
git-rebase -i usage.  ;-)

-- 
Shawn.
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html