Re: [PATCH v4 00/32] Lockfile correctness and refactoring

Jeff King <peff@xxxxxxxx> · Wed, 10 Sep 2014 15:11:59 -0400

On Wed, Sep 10, 2014 at 09:51:03AM -0700, Junio C Hamano wrote:

> Jeff King <peff@xxxxxxxx> writes:
> 
> > Yes, we don't let normal fetchers see these repos. They're only for
> > holding shared objects and the ref tips to keep them reachable.
> 
> Are these individual refs have relations to the real world after
> they are created?  To ask it another way, let's say that a branch in
> a repository, which is using this as a shared object store, caused
> one of these refs to be created; now the origin repository rewinds
> or deletes that branch---do you do anything to the ref in the shared
> object store at that point?

Yes, we fetch from them before doing any maintenance in the shared
repository (like running repack). That's how objects migrate into the
shared repository, as well.

> I am wondering if it makes sense to maintain a single ref that
> reaches all the commits in this shared object store repository,
> instead of keeping these millions of refs.  When you need to make
> more objects kept and reachable, create an octopus with the current
> tip and tips of all these refs that causes you to wish making these
> "more objects kept and reachable".  Obviously that won't work well
> if the reason why your current scheme uses refs is because you
> adjust individual refs to prune some objects---hence the first
> question in this message.

Exactly. You could do this if you threw away and re-made the octopus
after each fetch (and then threw away the individual branches that went
into it). For that matter, if all you really want are the tips for
reachability, you can basically run "for-each-ref | sort -u"; most of
these refs are tags that are duplicated between each fork.

However, having the individual tips does make some things easier. If I
only keep unique tips and I drop a tip from fork A, I would then need to
check every other fork to see if any other fork has the same tip. OTOH,
that means visiting N packed-refs files, each with (let's say) 3000
refs. As opposed to dealing with a packed-refs file with N*3000 refs. So
it's really not that different.

We also use the individual ref tips for packing. They factor into the
bitmap selection, and we have some patches (which I've been meaning to
upstream for a while now) to make delta selections in the shared-object
repository that will have a high chance of reuse in clones of individual
forks. And it's useful to query them for various reasons (e.g., "who is
referencing this object?").

There a lot of different ways to do it, and the giant refs file is a
pain (not to mention writing objects to disk in the forks, and then
migrating them separately to the shared storage). But doing it this way
means that the forks and the shared-object repository are all real
first-class git repositories. We follow the usual object reachability
guarantees, and it's safe to run any stock git command on them.

I am leaning towards a system where the shared-object repository is a
pseudo-repository, forks actually write directly into the shared object
store (probably with a symlinked objects/ directory), and implementing a
ref backend that generates a virtual mapping on the fly (e.g., all refs in
"../foo.git" appear as "refs/remotes/foo/*" in the pseudo-repository.
I'm watching the refs-backend work anxiously. :)

-Peff
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html