A naive proposal for preventing loose object explosions

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



I am imagining what I consider to be a naive approach to preventing
loose unreachable object explosions.   It may seem a bit heavy handed
at first, but every conversation so far about this issue seems to have
died, so I am looking for a simple incremental improvement to what we
have today. I theorize that this approach will provide the same
protections (good and bad) against races as using git-repack -A -d
and git-prune --expire <time> regularly will today.

1a)  Add --prune-packed option to git-repack to force a call to git
prune-packed, without having to specify the -d option to git-repack.

1b) Add a --keep <marker> option to git-repack which will create a
keep file with "marker" in it for existing pack files which were
repacked (not to the new pack).

1c) Now instead of running:

 git-repack -A -d

run:

 git-repack --prune-packed --keep 'prune-when-expired'


This should effectively keep a duplicate copy of all old packfiles
around, but the new pack file will not have unreferenced objects in
it.  This is similar to having unreachable loose objects left around,
but it also keeps around extra copy(ies) of reachable objects wasting
some disk space.  While this will normally consume more disk space in
pack files, it will not explode loose objects, which will likely save
a lot of space when such explosions would have occured.   Of course,
this should also prevent the severe performance downsides to these
explosions.  Object lookups should likely not get any slower than if
repack were not run, and the extra new pack might actually help
find some objects quicker.   Safety with respect to unreachable object
race conditions should be the same as using git repack -A -d since at
least one copy of every object should be kept around during this run?


Then:

2a) Add support for passing in a list of pack files to git-repack.
This list will then be used as the original "existing" list instead
of finding all packfiles without keeps.

2b) Add an --expire-marked <marker> option to git-prune which will
find any pack files with a .keep with "marker" in it, and evaluate if
it meets the --expire time.  If so, it will also call:

   git-repack -a -d <expired-pack-files>...

This should repack any reachable objects from the <expired-pack-files>
into a single new pack file.  This may again cause some reachable
object duplication (likely with the same performance affects as the
first git-repack phase above), but unreachable objects from <expired-
pack-files> will now have been pruned as they would have been if they
had originally been turned into loose objects.

3) Finally on the next repack cycle the current duplicated reachable
objects should likely get fully reconsolidated into a single copy.

Does this sound like it would work?  I may attempt to construct this
for internal use (since it is a bit hacky).  It feels like it could be
done mostly with some simple shell modding/wrapping (feels less scary than
messing with the core C tools).  I wonder if I a missing some obvious flaw
to this approach?

Thanks for any insights,

-Martin


--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html




[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]