Re: [PATCH] Support multiple virtual repositories with a single object store and refs

Jeff King <peff@xxxxxxxx> · Tue, 24 May 2011 15:59:37 -0400

On Mon, May 23, 2011 at 06:02:52PM -0700, Josh Triplett wrote:

> Given many repositories with copies of the same objects (such as
> branches of the same source), sharing a common object store will avoid
> duplication.  Alternates provide a single baseline, but don't handle
> ongoing activity in the various repositories.  Git safely handles
> concurrent accesses to the same object store across repositories, but
> operations such as gc need to know about all of the refs.
> 
> This change adds support in upload-pack and receive-pack to simulate
> multiple virtual repositories within the object store and references of
> a single underlying repository.

Neat idea. It is important to note, though, that it is possible to leak
information between virtual repos that share the same object store. You
can't directly say "give me object ABCD" if you don't have a ref to it,
but you can do some other sneaky things like:

  1. Claiming to push ABCD, at which point the server will optimize out
     the need for you to actually send it. Now you have a ref to ABCD
     and can fetch it (claiming not to have it, of course).

  2. Requesting other refs, claiming that you have ABCD, at which point
     the server may generates deltas against ABCD.

Both are problems with alternates, too, of course. But in the case of
alternates, you can share only a subset of the objects. So every day or
so, you could pack all of the objects that _all_ repos can see into one
big alternates repo, and then each "leaf" repo contains any objects
private to itself.

Of course none of this is a concern if you are just hosting public
repositories, or everyone who gets to see one virtual repo can see
what's in other ones (e.g., everybody is sharing objects within one
organization).

But it may make sense to touch on these issues in the documentation
(which also needs to be written at all :) ).

> The refs and heads of the virtual repositories get stored in the
> underlying repository using prefixed names specified by the
> --ref-prefix and --head options; for instance, --ref-prefix=repo1/
> will use refs/repo1/heads/* and refs/repo1/tags/*.  upload-pack and
> receive-pack will not expose any references that do not match the
> specified prefix.

You have a namespace clash if a repo is named "heads" or "tags" or
"remotes". Should we give it its own namespace, like:

  refs/virtual/repo1/heads/*

?

Also, it seems conceptually simpler to me if it's a straight prefix.
IOW, "refs/heads/foo" in repo1 becomes:

  refs/virtual/repo1/refs/heads/foo

Then if we are operating in the virtual repo1 space, then:

  1. It is an easy test to know whether we are allowed to see a ref:
     "does it start with refs/virtual/$repo/ ?"

  2. Converting back and forth is simple. You just prepend or strip the
     refs/virtual/$repo prefix.

-Peff
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html