Re: [PATCH v3 3/3] Add documentation for virtual repositories

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Wed, May 25, 2011 at 12:07:08PM -0400, Jeff King wrote:
> On Tue, May 24, 2011 at 05:46:32PM -0700, Jamey Sharp wrote:
> 
> >  Documentation/Makefile                 |    2 +-
> >  Documentation/git-http-backend.txt     |    4 +-
> >  Documentation/gitvirtual.txt           |   76 ++++++++++++++++++++++++++++++++
> >  contrib/completion/git-completion.bash |    2 +-
> 
> Maybe it would make sense to mention your new options to upload-pack and
> receive-pack in their manpages; the description can be short, but refer
> the user to gitvirtual.

Fair enough.  We'll go ahead and document them (in patch 1/3 with a
reference to gitvirtual added in patch 3/3), and avoid making the pile
of undocumented upload-pack and receive-pack options larger. :)

> > +Given many repositories with copies of the same objects (such as
> > +branches of the same source), sharing a common object store will avoid
> > +duplication.  Alternates provide a single baseline, but don't handle
> > +ongoing activity in the various repositories.  Furthermore, operations
> > +such as linkgit:git-gc[1] need to know about all of the refs.
> 
> It's not quite true that alternates provide only a single baseline. They
> can be updated and objects consolidated over time (e.g., with a nightly
> repack). The problem is that they require management to do so (this is
> also a benefit, if you want a sharing policy besides "all repos have all
> objects").

True enough.  We wanted something that automatically worked without
background maintenance, but alternates can help if you keep moving
common objects to the alternate repository.

> > +linkgit:git-upload-pack[1] and linkgit:git-receive-pack[1] rewrite the
> > +names of refs and heads as specified by the --ref-prefix and --head
> > +options.  For instance, --ref-prefix=`virtual/reponame/` will use
> > ++pass:[refs/virtual/reponame/heads/*]+ and
> > ++pass:[refs/virtual/reponame/tags/*]+.  git-upload-pack and
> > +git-receive-pack will ignore any references that do not match the
> > +specified prefix.
> 
> Thinking on the whole idea a bit more, is there a reason to restrict
> this to upload-pack and receive-pack? Sure, they are the most obvious
> places to use it for hosting, but might I not want to be able to do:
> 
>   cd /path/to/mega-repository.git
>   git --ref-prefix=virtual/repo1 log master
> 
> to do server-side scripting inside the virtual repos (or more likely,
> setting GIT_REF_PREFIX at the top of your script).

Many git commands will need special handling for this, though.  For
instance, gc needs to know about all refs, not just a prefix of refs;
otherwise it will break the repository.  Or, for an example within a
single command, the checks for updating a currently-checked-out ref in a
repository need to use the repository's HEAD, not the virtual HEAD.
And similarly, git checkout with a ref-prefix set would construct a
repository where HEAD doesn't match the workdir.

Having this handled "transparently" for all git commands seems likely to
run into this kind of corner case, where parts of a git command run
correctly with ref-prefix but other parts or other invoked git commands
must not run with ref-prefix.

I do agree that some other git programs could learn to use ref-prefix,
and it makes sense to move the functionality into refs.c as a general
mechanism for those programs to use.  However, I don't think it makes
sense to transparently make all git programs use ref-prefix without
checking them individually to see if it makes sense.

> > +The --ref-prefix and --head options provide quite a bit of flexibility
> > +in organizing the refs of virtual repositories within those of the
> > +underlying repository.  In the absence of a strong reason to do
> > +otherwise, consider following these conventions:
> > +
> > +--ref-prefix=`virtual/reponame/`::
> > +	This puts refs under `refs/virtual/reponame/`, which avoids a
> > +	namespace conflict between `reponame` and built-in ref
> > +	directories such as `heads` and `tags`.
> > +
> > +--head=`virtual-HEAD/reponame`::
> > +	This puts HEADs under `virtual-HEAD/` to avoid namespace
> > +	conflicts with top-level filenames in a git repository.
> 
> I'm curious if you have a use for this much flexibility. In particular,
> why do the HEAD and refs prefixes need the ability to be separate? Also,
> what about other non-HEAD top-level refs? IOW, a true "virtual
> repository" to me would just be:
> 
>   GIT_REF_PREFIX=refs/virtual/repo1
> 
> and then _every_ ref resolution would just prefix that, whether it was
> in refs/ or not. So you would have:
> 
>   .git/refs/virtual/repo1/HEAD
>   .git/refs/virtual/repo1/refs/heads/master
>   .git/refs/virtual/repo1/refs/tags/v1.0

Ah, *now* I see what you meant by including the repeated "refs/", and
using that to allow putting HEAD in the same namespace makes sense.

We don't actually need the flexibility of putting HEAD in a different
place, and this layout makes sense, so we can change the ref-prefix
mechanism to drop the separate --head entirely.

> > +SECURITY
> > +--------
> > +
> > +Anyone with access to any virtual repository can potentially access
> > +objects from any other virtual repository stored in the same underlying
> > +repository.  You can't directly say "give me object ABCD" if you don't
> > +have a ref to it, but you can do some other sneaky things like:
> > +
> > +. Claiming to push ABCD, at which point the server will optimize out the
> > +  need for you to actually send it. Now you have a ref to ABCD and can
> > +  fetch it (claiming not to have it, of course).
> > +
> > +. Requesting other refs, claiming that you have ABCD, at which point the
> > +  server may generate deltas against ABCD.
> > +
> > +None of this causes a problem if you only host public repositories, or
> > +if everyone who may read one virtual repo may also read everything in
> > +every other virtual repo (for instance, if everyone in an organization
> > +has read permission to every repository).
> 
> Well, this text is obviously correct and written by a very smart person.
> ;)
> 
> You might want to mention that if you do need to handle these security
> concerns, then the alternates route, even though it creates more
> management headache, is going to be more flexible with respect to which
> objects are shared.
> 
> In fact, given what I said at the very top of the email, I wonder if the
> documentation would be better structured as "here are two methods for
> sharing objects, here are reasons why you might choose one or the other,
> and here is how to use each".

I think it makes sense to reference alternates in the gitvirtual page, but
I don't think it makes sense to put the full documentation for both in
the same page.

- Josh Triplett
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]