On Tue, May 24, 2011 at 06:21:00PM -0700, Junio C Hamano wrote: > Jamey Sharp <jamey@xxxxxxxxxxx> writes: > > > From: Josh Triplett <josh@xxxxxxxxxxxxxxxx> > > > > Given many repositories with copies of the same objects (such as branches of > > the same source), sharing a common object store will avoid duplication. > > Alternates provide a single baseline, but don't handle ongoing activity in the > > various repositories. Furthermore, operations such as git-gc need to know > > about all of the refs. > > > > Git supports storing multiple virtual repositories within the object store and > > references of a single underlying repository. The underlying repository > > stores the objects for all of the virtual repositories, and includes all the > > refs and heads of the virtual repositories using prefixed names. > > I do not see anything changed up to this point since the previous > round... sent a wrong patch? Apparently so. I watched Josh fix up that commit message, and then I don't know where it went. > In any case, I _think_ what you are trying to say is: > > - Implemented in the most naïve way, you can host multiple instances of > related projects, but that is wasteful; their object stores will have > duplicated objects without sharing. (This is the crucial part missing > from your description that confused me when trying to _guess_ what > problem you are trying to solve in the first place). > > - You _could_ use alternates mechanism to alleviate that problem, but it > has issues, e.g. gc needs to be aware of other repositories (This is in > your first paragraph). > > - Instead, we could store a single, large, repository and carve out its > refs namespaces into multiple hierarchies, to make it look as if there > are multiple repositories. (The first sentence of the second paragraph > also confused me, as you said "Git supports storing multiple ..." in > present tense). Yes. I hope you won't mind if we blatantly steal this description. :-) > One thing you would want to be careful with is what to do with the HEAD > symrefs, which should appear to read "ref: refs/heads/<some-branch>" from > the point of view of the clients that are under the illusion that they are > interacting with one specific repository among others, while for the > purpose of gc and things in the huge single repository they should be > pointing at something like "refs/hosted-1-project/heads/<that-branch>", As far as I can tell, that isn't true. Judging by the pack-protocol documentation, my reading of the implementation, and the results of some tests I ran, symrefs are resolved to hashes before being sent over the wire, and then HEAD is magically re-inferred back into a symref on the other end. (This has the odd property that if you create a repository containing two branches with identical heads, then clone that repository, the clone's origin/HEAD will point to a randomly-selected one of the two branches. Tested in version 1.7.4.4, and seems to be a necessary consequence of the protocol design.) As a result, symrefs only need to be valid in the underlying repository; there's no mapping needed for the protocol. However, you probably do want a different HEAD for each virtual repository, which is why we added the --head option. We didn't actually think about impact of these virtual HEADs on gc. As long as they're all symrefs, they can't matter for gc, right? The head they reference is already a suitable gc root. If the virtual HEADs do need to participate in gc, then I guess we should update the conventions documentation to recommend that they live somewhere under refs/. > but other than that, after a lot of guesswork, the problem you are trying > to solve seems clearer to me. > > But please do not make me guess. Indeed. We'll get that right next round, honest this time. :-/ Now that you have the problem statement down, is the proposed solution acceptable for merge? Jamey
Attachment:
signature.asc
Description: Digital signature