On Wed, 10 Oct 2007, Han-Wen Nienhuys wrote: > > The way I solved that, was to have both repositories pointing to each > other, using alternates. Ouch. Double un-good. Not a good idea. Especially not if you do development in both and pull and push between them. What will happen is that if you do alternates pointing both ways, you basically end up having a "shared pool of objects". So it's pretty much equivalent to just using a shared object directory, and it has *exactly* the same issues with object reachability and references: you have a shared pool of objects, but you only ever see *one* set of references, so garbage collection cannot work - because it will always see just a subset of the real references, while it sees essentially all objects. > could it be that GC does not handle cyclic alternates correctly? It's not about cyclic per se: it's about the fact that GC will do garbage collection based on reachability with the local references. Which is normally fine. It's normally fine, because the object tree is "local" too. But when doing alternates: - the tree that is being used as an alternate *has* to be totally stable. It must *never* have been re-based, or have any GC'able objects in the first place. IOW, doing a "git gc" on it will be safe, because there is no way any objects that the other alternate depends on could be pruned. - You definitely must *not* do a two-way alternate, because that violates another rule: the rule that the "alternate base" (which is now *both* of the repositories) is self-sufficient. Since they both point to each other, there's no way to know whether they are self-sufficient or not: they may be re-using each others objects *and* packs! And in the above, the "*and* packs" is important, and probably the cause of your problems. Because "git repack -a -d -l" (which is what "git gc" does) will always gather up any loose objects even from remote sites, but the "-l" means that it will not do so for alternate packed objects. So what happens is that if one of the repositories can reach some object that is in a pack in the other repository, "git gc" will still *leave* it dependent on a pack in the other repository. But maybe that object isn't even reachable in the other repo any more (for whatever reason - a rebase, whatever), then when you repack the other repository, now all the packs will be replaced by one new pack - and the one new pack will only contain the objects reachable from the other repo. IOW: alternates are dangerous. A shared object directory is dangerous. You should basically only do it under very controlled circumstances, and otherwise you should use either hardlinks or if you want added safety, totally separate repositories. Basically, here's an example of badness, with A and B being repos that point to each other. - do something in A - pull it into B - this leaves the objects in A, because of the alternates link. - rebase A - "git gc" in A: this removes unreachable objects from A, and now B is screwed. So the rule really is: never *ever* do anything but fast-forward in a repo that is an alternate for another one. If you do a circular link, I think it's still safe if you follow that rule, but now obviously the rule holds for *both* repos (and quite frankly, I'd worry so much that I'd never do it even then). There should be another rule too: git on its own is not a backup system. You can use git *as* a backup system, but you need to do so by mirroring the whole repository, and not on the same disk. (ie, for me, git *is* a backup system, but that's only because I push my repos to other sites - a single git repo on its own has zero redundancy) Linus - To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html