GC of alternate object store (was: Bringing a bit more sanity to $GIT_DIR/objects/info/alternates?)

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Oswald Buddenhagen wrote:
> (...)so the second approach is the "bare aggregator repo" which adds
> all other repos as remotes, and the other repos link back via
> alternates. problems:
> 
> - to actually share objects, one always needs to push to the aggregator

Run a cron job which frequently does that?

> - tags having a shared namespace doesn't actually work, because the
> repos have the same tags on different commits (they are independent
> repos, after all)

Junio's proposal partially fixes that: It pushes refs/* instead of
refs/heads/*, to refs/remotes/<borrowing repo>/.  However...

> - one still cannot safely garbage-collect the aggregator, as the refs
> don't include the stashes and the index, so rebasing may invalidate
> these more transient objects.

Also if you copy a repo (e.g. making a backup) instead of cloning it,
and then start using both, they'll push into the same namespace -
overwriting each other's refs.  Non-fast-forward pushes can thus lose
refs to objects needed by the other repo.

receive.denyNonFastForwards only rejects pushes to refs/heads/ or
something.  (A feature, as I learned when I reported it as bug:-)
IIRC Git has no config option to reject all non-fast-forward pushes.

> i would re-propose hallvard's "volatile" alternates (at least i think that's
> what he was talking about two weeks ago): they can be used to obtain
> objects, but every object which is in any way referenced from the current
> clone must be available locally (or from a "regular" alternate). that means
> that diffing, etc.  would get objects only temporarily, while cherry-picking
> would actually copy (some of) the objects. this would make it possible to
> "cross-link" repositories, safely and without any "3rd parties".

I'm afraid that idea by itself won't work:-(  Either you borrow from a
store or not.  If Git uses an object from the volatile store, it can't
always know if the caller needs the object to be copied.

OTOH volatile stores which you do *not* borrow from would be useful:
Let fetch/repack/gc/whatever copy missing objects from there.


2nd attempt for a way to gc of the alternate repo:  Copy the with
removed objects into each borrowing repo, then gc them.   Like this:

1. gc, but pack all to-be-removed objects into a "removable" pack.

2. Hardlink/copy the removable pack - with a .keep file - into
   borrowing repos when feasible:  I.e. repos you can find and
   have write access to.  Update their .git/objects/info/packs.
   (Is there a Git command for this?)  Repeat until nothing to do,
   in case someone created a new repo during this step.

3. Move the pack from the alternate repo to a backup object store
   which will keep it for a while.

4. Delete the .keep files from step (2).  They were needed in case
   a user gc'ed away an object from the pack and then added an
   identical object - borrowed from the to-be-removed pack.

5. gc/repack the other repos at your leisure.

666. Repos you could not update in step (2), can get temporarily
   broken.  Their owners must link the pack from the backup store by
   hand, or use that store as a volatile store and then gc/repack.

Loose objects are a problem:  If a repo has longer expiry time(s)
than the alternate store, it will get loads of loose objects from all
repos which push into the alternate store.  Worse, gc can *unpack*
those objects, consuming a lot of space.  See threads "git gc == git
garbage-create from removed branch" (3 May) and "Keeping unreachable
objects in a separate pack instead of loose?" (10 Jun).

Presumably the work-arounds are:
- Use long expiry times in the alternate repo.  I don't know which
  expiration config settings are relevant how.
- Add some command which checks and warns if the repo has longer
  expiry time than the repo it borrows from.
Also I hope Git will be changed to instead pack such loose objects
somewhere, as discussed in the above threads.

All in all, this isn't something you'd want to do every day.  But it
looks doable and can be scripted.
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]