On Wed, Sep 10, 2014 at 09:51:03AM -0700, Junio C Hamano wrote: > Jeff King <peff@xxxxxxxx> writes: > > > Yes, we don't let normal fetchers see these repos. They're only for > > holding shared objects and the ref tips to keep them reachable. > > Are these individual refs have relations to the real world after > they are created? To ask it another way, let's say that a branch in > a repository, which is using this as a shared object store, caused > one of these refs to be created; now the origin repository rewinds > or deletes that branch---do you do anything to the ref in the shared > object store at that point? Yes, we fetch from them before doing any maintenance in the shared repository (like running repack). That's how objects migrate into the shared repository, as well. > I am wondering if it makes sense to maintain a single ref that > reaches all the commits in this shared object store repository, > instead of keeping these millions of refs. When you need to make > more objects kept and reachable, create an octopus with the current > tip and tips of all these refs that causes you to wish making these > "more objects kept and reachable". Obviously that won't work well > if the reason why your current scheme uses refs is because you > adjust individual refs to prune some objects---hence the first > question in this message. Exactly. You could do this if you threw away and re-made the octopus after each fetch (and then threw away the individual branches that went into it). For that matter, if all you really want are the tips for reachability, you can basically run "for-each-ref | sort -u"; most of these refs are tags that are duplicated between each fork. However, having the individual tips does make some things easier. If I only keep unique tips and I drop a tip from fork A, I would then need to check every other fork to see if any other fork has the same tip. OTOH, that means visiting N packed-refs files, each with (let's say) 3000 refs. As opposed to dealing with a packed-refs file with N*3000 refs. So it's really not that different. We also use the individual ref tips for packing. They factor into the bitmap selection, and we have some patches (which I've been meaning to upstream for a while now) to make delta selections in the shared-object repository that will have a high chance of reuse in clones of individual forks. And it's useful to query them for various reasons (e.g., "who is referencing this object?"). There a lot of different ways to do it, and the giant refs file is a pain (not to mention writing objects to disk in the forks, and then migrating them separately to the shared storage). But doing it this way means that the forks and the shared-object repository are all real first-class git repositories. We follow the usual object reachability guarantees, and it's safe to run any stock git command on them. I am leaning towards a system where the shared-object repository is a pseudo-repository, forks actually write directly into the shared object store (probably with a symlinked objects/ directory), and implementing a ref backend that generates a virtual mapping on the fly (e.g., all refs in "../foo.git" appear as "refs/remotes/foo/*" in the pseudo-repository. I'm watching the refs-backend work anxiously. :) -Peff -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html