Re: git gc & deleted branches

Jeremy Maitin-Shepard <jbms@xxxxxxx> · Fri, 09 May 2008 21:51:15 -0400

Junio C Hamano <gitster@xxxxxxxxx> writes:

> "Shawn O. Pearce" <spearce@xxxxxxxxxxx> writes:
>> Jeremy Maitin-Shepard <jbms@xxxxxxx> wrote:
>>> It is extremely cumbersome to have to worry about whether there are
>>> other concurrent accesses to the repository when running e.g. git gc.
>>> For servers, you may never be able to guarantee that nothing else is
>>> accessing the repository concurrently.  Here is a possible solution:
>>> 
>>> Each git process creates a log file of the references that it has
>>> created.  The log file should be named in some way with e.g. the process
>>> id and start time of the process, and simply consist of a list of
>>> 20-byte sha1 hashes to be considered additional in-use references for
>>> the purpose of garbage collection.

> How would that solve the issue that you should not prune/gc the repository
> "clone --shared" aka "alternates" borrows from?

The log files are only for handling in-progress commands editing the
repository.  I also describe in first part of the e-mail a possible
solution to that issue as well as the issues created by having multiple
working directories:

When you create a new working directory, you would also create in the
original repository a symlink named
e.g. orig_repo/.git/peers/<some-arbitrary-name-that-doesn't-matter> that
points to the .git directory of the newly created working directory.
git clone -shared would likewise create such a link in the original
repository.  There could be a separate simple command to "destroy" a
repository created via clone -shared or via new-work-dir that would
simply remove this "peer" symlink from any repositories it shares from,
and then rm -rf the target repository.  The list of repositories that a
given target repository shares from would be discovered using perhaps
several different methods, depending on whether it is a new work dir, an
actual separate repository, or the new type of "shared" repository I
suggested in my original e-mail, namely one that has its own refs but
completely shares the object store of the original repository, e.g. via
a symlink to the original repository's objects directory In any case, I
believe the information to go "upstream" is already available, and we
just need to add those "peer" symlinks in order to be able to go
"downstream".

There could also be a simple git command to move a repository that would
take care of updating all of the references that other repositories have
to it.  Currently it is not possible to write such a command, because
the "downstream" links are not stored, but with these added symlinks it
would be possible.

As I said in my previous e-mail, if git gc finds any broken symlinks
(i.e. symlinks that point to invalid repositories), it would error out,
because user attention is required to specify whether the symlinks
correspond to deleted repositories, or to repositories that have been
moved without making the proper updates.

> By the way, I do not think your "git-commit stopped for two weeks due to a
> long editing session of the commit message" should result in any object
> lossage, as the new objects are all reachable from the index, and the new
> tree nor the new commit hasn't been built while you are typing (rather,
> not typing) the log message.

> Hmm, a partial commit that uses a temporary index file may lose, come to
> think of it.  Perhaps we should teach reachable.c about the temporary
> index file as well.  I dunno.

Well, providing a generic mechanism for telling git about reachable
things other than the index and refs is precisely what these log files
would do, and also because they would record the process id and a
timestamp, stale log files would automatically get cleaned up.  If each
individual git command has its own special way of trying to keep track
of temporary references, it is just going to be more complicated and
more error prone.

-- 
Jeremy Maitin-Shepard
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html