On Wed, May 25, 2011 at 12:07:08PM -0400, Jeff King wrote: > On Tue, May 24, 2011 at 05:46:32PM -0700, Jamey Sharp wrote: > > > Documentation/Makefile | 2 +- > > Documentation/git-http-backend.txt | 4 +- > > Documentation/gitvirtual.txt | 76 ++++++++++++++++++++++++++++++++ > > contrib/completion/git-completion.bash | 2 +- > > Maybe it would make sense to mention your new options to upload-pack and > receive-pack in their manpages; the description can be short, but refer > the user to gitvirtual. Fair enough. We'll go ahead and document them (in patch 1/3 with a reference to gitvirtual added in patch 3/3), and avoid making the pile of undocumented upload-pack and receive-pack options larger. :) > > +Given many repositories with copies of the same objects (such as > > +branches of the same source), sharing a common object store will avoid > > +duplication. Alternates provide a single baseline, but don't handle > > +ongoing activity in the various repositories. Furthermore, operations > > +such as linkgit:git-gc[1] need to know about all of the refs. > > It's not quite true that alternates provide only a single baseline. They > can be updated and objects consolidated over time (e.g., with a nightly > repack). The problem is that they require management to do so (this is > also a benefit, if you want a sharing policy besides "all repos have all > objects"). True enough. We wanted something that automatically worked without background maintenance, but alternates can help if you keep moving common objects to the alternate repository. > > +linkgit:git-upload-pack[1] and linkgit:git-receive-pack[1] rewrite the > > +names of refs and heads as specified by the --ref-prefix and --head > > +options. For instance, --ref-prefix=`virtual/reponame/` will use > > ++pass:[refs/virtual/reponame/heads/*]+ and > > ++pass:[refs/virtual/reponame/tags/*]+. git-upload-pack and > > +git-receive-pack will ignore any references that do not match the > > +specified prefix. > > Thinking on the whole idea a bit more, is there a reason to restrict > this to upload-pack and receive-pack? Sure, they are the most obvious > places to use it for hosting, but might I not want to be able to do: > > cd /path/to/mega-repository.git > git --ref-prefix=virtual/repo1 log master > > to do server-side scripting inside the virtual repos (or more likely, > setting GIT_REF_PREFIX at the top of your script). Many git commands will need special handling for this, though. For instance, gc needs to know about all refs, not just a prefix of refs; otherwise it will break the repository. Or, for an example within a single command, the checks for updating a currently-checked-out ref in a repository need to use the repository's HEAD, not the virtual HEAD. And similarly, git checkout with a ref-prefix set would construct a repository where HEAD doesn't match the workdir. Having this handled "transparently" for all git commands seems likely to run into this kind of corner case, where parts of a git command run correctly with ref-prefix but other parts or other invoked git commands must not run with ref-prefix. I do agree that some other git programs could learn to use ref-prefix, and it makes sense to move the functionality into refs.c as a general mechanism for those programs to use. However, I don't think it makes sense to transparently make all git programs use ref-prefix without checking them individually to see if it makes sense. > > +The --ref-prefix and --head options provide quite a bit of flexibility > > +in organizing the refs of virtual repositories within those of the > > +underlying repository. In the absence of a strong reason to do > > +otherwise, consider following these conventions: > > + > > +--ref-prefix=`virtual/reponame/`:: > > + This puts refs under `refs/virtual/reponame/`, which avoids a > > + namespace conflict between `reponame` and built-in ref > > + directories such as `heads` and `tags`. > > + > > +--head=`virtual-HEAD/reponame`:: > > + This puts HEADs under `virtual-HEAD/` to avoid namespace > > + conflicts with top-level filenames in a git repository. > > I'm curious if you have a use for this much flexibility. In particular, > why do the HEAD and refs prefixes need the ability to be separate? Also, > what about other non-HEAD top-level refs? IOW, a true "virtual > repository" to me would just be: > > GIT_REF_PREFIX=refs/virtual/repo1 > > and then _every_ ref resolution would just prefix that, whether it was > in refs/ or not. So you would have: > > .git/refs/virtual/repo1/HEAD > .git/refs/virtual/repo1/refs/heads/master > .git/refs/virtual/repo1/refs/tags/v1.0 Ah, *now* I see what you meant by including the repeated "refs/", and using that to allow putting HEAD in the same namespace makes sense. We don't actually need the flexibility of putting HEAD in a different place, and this layout makes sense, so we can change the ref-prefix mechanism to drop the separate --head entirely. > > +SECURITY > > +-------- > > + > > +Anyone with access to any virtual repository can potentially access > > +objects from any other virtual repository stored in the same underlying > > +repository. You can't directly say "give me object ABCD" if you don't > > +have a ref to it, but you can do some other sneaky things like: > > + > > +. Claiming to push ABCD, at which point the server will optimize out the > > + need for you to actually send it. Now you have a ref to ABCD and can > > + fetch it (claiming not to have it, of course). > > + > > +. Requesting other refs, claiming that you have ABCD, at which point the > > + server may generate deltas against ABCD. > > + > > +None of this causes a problem if you only host public repositories, or > > +if everyone who may read one virtual repo may also read everything in > > +every other virtual repo (for instance, if everyone in an organization > > +has read permission to every repository). > > Well, this text is obviously correct and written by a very smart person. > ;) > > You might want to mention that if you do need to handle these security > concerns, then the alternates route, even though it creates more > management headache, is going to be more flexible with respect to which > objects are shared. > > In fact, given what I said at the very top of the email, I wonder if the > documentation would be better structured as "here are two methods for > sharing objects, here are reasons why you might choose one or the other, > and here is how to use each". I think it makes sense to reference alternates in the gitvirtual page, but I don't think it makes sense to put the full documentation for both in the same page. - Josh Triplett -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html