Re: git gc & deleted branches

"Shawn O. Pearce" <spearce@xxxxxxxxxxx> · Fri, 9 May 2008 20:20:14 -0400

Jeremy Maitin-Shepard <jbms@xxxxxxx> wrote:
> It is extremely cumbersome to have to worry about whether there are
> other concurrent accesses to the repository when running e.g. git gc.
> For servers, you may never be able to guarantee that nothing else is
> accessing the repository concurrently.  Here is a possible solution:
> 
> Each git process creates a log file of the references that it has
> created.  The log file should be named in some way with e.g. the process
> id and start time of the process, and simply consist of a list of
> 20-byte sha1 hashes to be considered additional in-use references for
> the purpose of garbage collection.

I believe we partially considered that in the past and discarded it
as far too complex implementation-wise for the benefit it gives us.

The current approach of leaving unreachable loose objects around
for 2 weeks is good enough.  Any Git process that has been running
for 2 weeks while still not linking everything it needs into the
reachable refs of that repository is already braindamaged and
shouldn't be running anymore.

If we are dealing with a pack file, those are protected by .keep
"lock files" between the time they are created on disk and the
time that the git-fetch or git-receive-pack process has finished
updating the refs to anchor the pack's contents as reachable.
Every once in a while a stale .keep file gets left behind when a
process gets killed by the OS, and its damn annoying to clean up.

I'd hate to clean up logs from every little git-add or git-commit
that aborted in the middle uncleanly.

-- 
Shawn.
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html