Brandon Casey <casey@xxxxxxxxxxxxxxx> wrote: > Shawn O. Pearce wrote: > > > > b) Don't repack the source repository without accounting for the > > refs and reflogs of all --shared repositories that came from it. > > Otherwise you may delete objects that the source repository no > > longer needs, but that one or more of the --shared repositories > > still needs. > > How should this be accomplished? Does this mean never run > git-gc/git-repack on the source repository? Or is there a way to > cause the sharing repositories to copy over objects no longer > required by the source repository? Well, you can repack, but only if if you account for everything. The easiest way to do this is push every branch from the --shared repos to the source repository, repack the source repository, then you can run `git prune-packed` in the --shared repos to remove loose objects that the source repository now has. You can account for the refs by hand when you run pack-objects by hand, but its horribly difficult compared to the push and then repack I just described. I think that long-lived --shared isn't that common of a workflow; most people use --shared for shortterm things. For example contrib/continuous uses --shared when it clones the repository to create a temporary build area. > >>4) is space savings obtained only at initial clone? or is it on going? > >> does a future git pull from the source repository create new hard > >> links where possible? > > > >Only on initial clone. Later pulls will copy. You can try using > >git-relink to redo the hardlinks after the pull. > > How about with --shared? Particularly with a fast-forward not much > would need to be copied over. Do later pulls into a repository with > configured objects/info/alternates take advantage of space savings > when possible? Yes. Recently a --shared avoids copying the objects if at all possible. This makes fetches from the source repository into the --shared repository very, very fast, and uses no additional disk. > If the answer above is "yes", then this brings up an interesting use > case. I assume that clone, fetch, etc follow the alternates of the > source repository? Otherwise a --shared repository would be unclone-able > right? And only pull-able from the source repository? So if that is the > case (that remote alternates are followed), Alternates are followed as many as 5 deep. So you can do something like this: git clone --shared source share1 git clone --shared share1 share2 git clone --shared share2 share3 git clone --shared share3 share4 git clone --shared share4 share5 git clone --shared share5 corrupt I think corrupt is corrupt; it doesn't have access to the source anymore and therefore is missing 90%+ of the object database. To help make this case work the objects/info/alternates should always contain absolute paths; we store them absolute in git-clone by default but you could set them up by hand. The other repositories should however be intact and usable, but you cannot clone from share5. Normal fetch/push/pull will work fine against any of those working repos, as they are all using the normal Git object transport methods, which means we copy objects unless they are available to us already (see above). > then a group of developers > could add all of the other developers to their alternates list (if > multiple alternates are supported) Yes, they are. I don't think we have a limit on the number of alternates you are allowed to have. However each additional alternate adds some cost to starting up any given Git process. The more alternates you have (or the more deeply nested they are) the slower Git will initialize itself. For 1 or 2 alternates its within the fork+exec noise of any good UNIX system; for 50 alternates I think you would notice it. > and reference their objects when > possible. To the extent that it is possible, each developer would end up > only storing their commit objects. This would then create a distributed > repository. Yes, but that has very high risk. If developer Joe Smith quits and then the administrator `rm -rf /home/jsmith` everyone is hosed as they can no longer access the objects that were originally created by Joe. Then the administrator is off looking for backup tapes, assuming he has them and they are valid. One nice property of Git (really any DVCS) is that the data is automatically backed up by every developer participating in the project. Its unlikely you will lose the project that way. Also this scheme doesn't really work well for packing. I don't think we'll pack the loose objects that we borrow from the other developers, and Git packfiles are a major performance improvement for all Git operations. Plus they are very small, so they save a lot of disk. You might find that it takes up less total disk to have everyone keep a complete (non --shared) copy of the project, but repack regularly, then to have everyone using alternates against each other and nobody repacks. > Of course, this new distributed repository may be somewhat fragile since > the entire thing could become unusable if any portion was corrupted. > Just because you can do a thing, doesn't mean you should. Yes, exactly. ;-) In my day-job repositories I have about 150 MiB of blobs that are very common across a number of Git repositories. I've made a single repo that has all of those packed, and then setup that as an alternate for everything else. It saves a huge chunk of disk for us. But that common-blob.git thing that I created never gets changed, and never gets repacked. Its sort of a "historical archive" for us. Works very nicely. Alternates have their uses... -- Shawn. - To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html