Re: Keeping unreachable objects in a separate pack instead of loose?

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Mon, Jun 11, 2012 at 02:20:12PM -0400, Ted Ts'o wrote:

> On Mon, Jun 11, 2012 at 01:54:19PM -0400, Jeff King wrote:
> > 
> > You're doing it wrong (but you can hardly be blamed, because there isn't
> > good tool support for doing it right). You should never prune or repack
> > in the base repo without taking into account all of the refs of its
> > children.
> 
> Well, I don't do a simple gc.  See the complicated set of steps I use
> to make sure I don't lose loose commits at the end of my last e-mail
> message on this thread.  It gets worse when I have multiple devel
> repos, but I simplified things for the purposes of discussion.

Ah, right. I was thinking that your first step, which is "git repack
-Adfl", would throw out old objects rather than unpack them in recent
versions of git.  But due to the way I implemented it (namely that you
must pass --unpack-unreachable yourself, so this feature only kicks in
automatically for "git gc"), that is not the case.

I don't recall if that was an accident, or if I was very clever in
maintaining backwards compatibility for your case. Let's just assume the
latter. :)

> > We have a similar setup at github (every time you "fork" a repo, it is
> > creating a new repo that links back to a project-wide "network" repo for
> > its object store). We maintain a refs/remotes/XXX directory for each
> > child repo, which stores the complete refs/ hierarchy of that child.
> 
> So you basically are copying the refs around and making sure the
> parent repo has an uptodate pointer of all of the child repos, such
> that when you do the repack, *all* of the commits end up in the parent
> commit, correct?

Yes. The child repositories generally have no objects in them at all
(they occasionally do for a period between runs of the migration
script).

> The system that I'm using means that objects which are local to a
> child repo stays in the child repo, and if an object is about to be
> dropped from the parent repo as a result of a gc, the child repo has
> an opportunity claim a copy of that object for itself in its object
> database.

That implies the concept of "local to a child repo", which implies that
you have some set of "common" refs. I suspect in your case your base
repo represents the master branch, or something similar. We actually
treat our network repo as a pure parent; every repo, including the
original one that everybody forks from, is a child. That makes it easier
to treat the original repo as just another repo (e.g., the original
owner is free to delete it, and the forks won't care).

> You can do things either way.  I like knowing that objects only used
> by a child repo are in the child repo's .git directory, but that's
> arguably more of a question of taste than anything else.

Yeah, I don't think there is any real benefit to it.

-Peff
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]