Junio C Hamano <gitster@xxxxxxxxx> writes: > "Shawn O. Pearce" <spearce@xxxxxxxxxxx> writes: >> Jeremy Maitin-Shepard <jbms@xxxxxxx> wrote: >>> It is extremely cumbersome to have to worry about whether there are >>> other concurrent accesses to the repository when running e.g. git gc. >>> For servers, you may never be able to guarantee that nothing else is >>> accessing the repository concurrently. Here is a possible solution: >>> >>> Each git process creates a log file of the references that it has >>> created. The log file should be named in some way with e.g. the process >>> id and start time of the process, and simply consist of a list of >>> 20-byte sha1 hashes to be considered additional in-use references for >>> the purpose of garbage collection. > How would that solve the issue that you should not prune/gc the repository > "clone --shared" aka "alternates" borrows from? The log files are only for handling in-progress commands editing the repository. I also describe in first part of the e-mail a possible solution to that issue as well as the issues created by having multiple working directories: When you create a new working directory, you would also create in the original repository a symlink named e.g. orig_repo/.git/peers/<some-arbitrary-name-that-doesn't-matter> that points to the .git directory of the newly created working directory. git clone -shared would likewise create such a link in the original repository. There could be a separate simple command to "destroy" a repository created via clone -shared or via new-work-dir that would simply remove this "peer" symlink from any repositories it shares from, and then rm -rf the target repository. The list of repositories that a given target repository shares from would be discovered using perhaps several different methods, depending on whether it is a new work dir, an actual separate repository, or the new type of "shared" repository I suggested in my original e-mail, namely one that has its own refs but completely shares the object store of the original repository, e.g. via a symlink to the original repository's objects directory In any case, I believe the information to go "upstream" is already available, and we just need to add those "peer" symlinks in order to be able to go "downstream". There could also be a simple git command to move a repository that would take care of updating all of the references that other repositories have to it. Currently it is not possible to write such a command, because the "downstream" links are not stored, but with these added symlinks it would be possible. As I said in my previous e-mail, if git gc finds any broken symlinks (i.e. symlinks that point to invalid repositories), it would error out, because user attention is required to specify whether the symlinks correspond to deleted repositories, or to repositories that have been moved without making the proper updates. > By the way, I do not think your "git-commit stopped for two weeks due to a > long editing session of the commit message" should result in any object > lossage, as the new objects are all reachable from the index, and the new > tree nor the new commit hasn't been built while you are typing (rather, > not typing) the log message. > Hmm, a partial commit that uses a temporary index file may lose, come to > think of it. Perhaps we should teach reachable.c about the temporary > index file as well. I dunno. Well, providing a generic mechanism for telling git about reachable things other than the index and refs is precisely what these log files would do, and also because they would record the process id and a timestamp, stale log files would automatically get cleaned up. If each individual git command has its own special way of trying to keep track of temporary references, it is just going to be more complicated and more error prone. -- Jeremy Maitin-Shepard -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html