Re: [PATCH v3 1/3] Support multiple virtual repositories with a single object store and refs

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Tue, May 24, 2011 at 06:21:00PM -0700, Junio C Hamano wrote:
> Jamey Sharp <jamey@xxxxxxxxxxx> writes:
> 
> > From: Josh Triplett <josh@xxxxxxxxxxxxxxxx>
> >
> > Given many repositories with copies of the same objects (such as branches of
> > the same source), sharing a common object store will avoid duplication.
> > Alternates provide a single baseline, but don't handle ongoing activity in the
> > various repositories.  Furthermore, operations such as git-gc need to know
> > about all of the refs.
> >
> > Git supports storing multiple virtual repositories within the object store and
> > references of a single underlying repository.  The underlying repository
> > stores the objects for all of the virtual repositories, and includes all the
> > refs and heads of the virtual repositories using prefixed names.
> 
> I do not see anything changed up to this point since the previous
> round... sent a wrong patch?

Apparently so. I watched Josh fix up that commit message, and then I
don't know where it went.

> In any case, I _think_ what you are trying to say is:
> 
>  - Implemented in the most naïve way, you can host multiple instances of
>    related projects, but that is wasteful; their object stores will have
>    duplicated objects without sharing. (This is the crucial part missing
>    from your description that confused me when trying to _guess_ what
>    problem you are trying to solve in the first place).
> 
>  - You _could_ use alternates mechanism to alleviate that problem, but it
>    has issues, e.g. gc needs to be aware of other repositories (This is in
>    your first paragraph).
> 
>  - Instead, we could store a single, large, repository and carve out its
>    refs namespaces into multiple hierarchies, to make it look as if there
>    are multiple repositories. (The first sentence of the second paragraph
>    also confused me, as you said "Git supports storing multiple ..." in
>    present tense).

Yes. I hope you won't mind if we blatantly steal this description. :-)

> One thing you would want to be careful with is what to do with the HEAD
> symrefs, which should appear to read "ref: refs/heads/<some-branch>" from
> the point of view of the clients that are under the illusion that they are
> interacting with one specific repository among others, while for the
> purpose of gc and things in the huge single repository they should be
> pointing at something like "refs/hosted-1-project/heads/<that-branch>",

As far as I can tell, that isn't true. Judging by the pack-protocol
documentation, my reading of the implementation, and the results of some
tests I ran, symrefs are resolved to hashes before being sent over the
wire, and then HEAD is magically re-inferred back into a symref on the
other end.

(This has the odd property that if you create a repository containing
two branches with identical heads, then clone that repository, the
clone's origin/HEAD will point to a randomly-selected one of the two
branches. Tested in version 1.7.4.4, and seems to be a necessary
consequence of the protocol design.)

As a result, symrefs only need to be valid in the underlying repository;
there's no mapping needed for the protocol. However, you probably do
want a different HEAD for each virtual repository, which is why we added
the --head option.

We didn't actually think about impact of these virtual HEADs on gc. As
long as they're all symrefs, they can't matter for gc, right? The head
they reference is already a suitable gc root. If the virtual HEADs do
need to participate in gc, then I guess we should update the conventions
documentation to recommend that they live somewhere under refs/.

> but other than that, after a lot of guesswork, the problem you are trying
> to solve seems clearer to me.
> 
> But please do not make me guess.

Indeed. We'll get that right next round, honest this time. :-/

Now that you have the problem statement down, is the proposed solution
acceptable for merge?

Jamey

Attachment: signature.asc
Description: Digital signature


[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]