Re: Keeping unreachable objects in a separate pack instead of loose?

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Mon, Jun 11, 2012 at 11:31:03AM -0400, Ted Ts'o wrote:

> I'm currently using 1.7.10.2.552.gaa3bb87, and a "git gc" still kicked
> loose a little over 4.5 megabytes of loose objects were not pruned via
> "git prune" (since they hadn't yet expired).  These loose objects
> could be stored in a 244k pack file.

Out of curiosity, what is the size of the whole repo? If it's a 500M
kernel repository, then 4.5M is not all _that_ worrisome. Not that it
could not be better, or that it's not worth addressing (since there are
corner cases that behave way worse). But it gives a sense of the urgency
of the problem, if that is the scope of the issue for average use.

> What I think would make sense is for git pack-objects to have a new
> option which outputs a list of object id's which whould have been
> kicked out as loose objects if it had been given the (undocumented)
> --unpacked-unreachable option.  Then the git-repack shell script (if
> given the -A option) would use that new option instead of
> --unpacked-unreachable, and then using the list created by this new
> option, create another pack which contains all of these
> unreachable-but-not-yet-expired objects.

I don't think that will work, because we will keep repacking the
unreachable bits into new packs. And the 2-week expiration is based on
the pack timestamp. So if your "repack -Ad" ends in two packs (the one
you actually want, and the pack of expired crap), then you would get
into this cycle:

  1. You run "git repack -Ad". It makes A.pack, with stuff you want, and
     B.pack, with unreachable junk. They both get a timestamp of "now".

  2. A day passes. You run "git repack -Ad" again. It makes C.pack, the
     new stuff you want, and repacks all of B.pack along with the
     new expired cruft from A.pack, making D.pack. B.pack can go away.
     D.pack gets a timestamp of "now".

And so on, as long as you repack within the two week window, the objects
from the cruft pack will never get ejected. So you might suggest that
the problem is that in step 2, we repack the items from B. But if you
don't, then you will accumulate a bunch of cruft packs (2 weeks worth),
and those objects won't be delta'd against each other.  It's probably
better than making them all loose, of course (you get chunks of delta'd
objects from each repack, instead of none at all), but it's far from a
full solution to the issue.

I think solving it for good would involve a separate list of per-object
expiration dates. Obviously we get that easily with loose objects (since
it is one object per file).

As a workaround, it might be worth lowering the default pruneExpire from
2 weeks to 1 day or something. It is really about creating safety for
operations in progress (e.g., you write the object, and then are _about_
to add it to the index or update a ref when it gets pruned). I think the
2 weeks number was pulled out of a hat as "absurdly long for an
operation to take", and was never revisited because nobody cared or
complained.

-Peff
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]