Re: Keeping unreachable objects in a separate pack instead of loose?

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Mon, Jun 11, 2012 at 12:08:24PM -0400, Jeff King wrote:
> On Mon, Jun 11, 2012 at 11:31:03AM -0400, Ted Ts'o wrote:
> 
> > I'm currently using 1.7.10.2.552.gaa3bb87, and a "git gc" still kicked
> > loose a little over 4.5 megabytes of loose objects were not pruned via
> > "git prune" (since they hadn't yet expired).  These loose objects
> > could be stored in a 244k pack file.
> 
> Out of curiosity, what is the size of the whole repo? If it's a 500M
> kernel repository, then 4.5M is not all _that_ worrisome. Not that it
> could not be better, or that it's not worth addressing (since there are
> corner cases that behave way worse). But it gives a sense of the urgency
> of the problem, if that is the scope of the issue for average use.

It' my e2fsprogs development repo.  I have my "base" repo, which is
what has been pushed out to the public (including a rewinding pu
branch).  The total size of that repo is a little over 15 megs:

<tytso@xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx> {/usr/projects/e2fsprogs/e2fsprogs}  [maint]
899% ls ../base/objects/pack/
total 16156
  908 pack-6964a1516433f16e43dcdf4fcec1996052099f31.idx
15248 pack-6964a1516433f16e43dcdf4fcec1996052099f31.pack

I then have my development repo, which uses a
.git/objects/info/alternates pointing at the bare "base" repo, so the
only thing in this repo are my private development branches, and other
things that haven't been pushed for public consumption.

<tytso@xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx> {/usr/projects/e2fsprogs/e2fsprogs}  [maint]
900% ls .git/objects/pack/
total 1048
 28 5a486e6c2156109f7dfc725b36a201c10652803d.idx    28 pack-7b2a9cccab669338f61a681e34c39362976fb5de.idx
224 5a486e6c2156109f7dfc725b36a201c10652803d.pack  768 pack-7b2a9cccab669338f61a681e34c39362976fb5de.pack

The 4.5 megabytes of loose objects packed down to a 224k "cruft" repo,
and 768k worth of private development objects.

So depending on how you would want to do the comparison, probably the
fairest thing to say is that I had a total "good" packs totally about
16 megs, and the loose cruft objects was an additional 4.5 megabytes.

> I don't think that will work, because we will keep repacking the
> unreachable bits into new packs. And the 2-week expiration is based on
> the pack timestamp. So if your "repack -Ad" ends in two packs (the one
> you actually want, and the pack of expired crap), then you would get
> into this cycle:
> 
>   1. You run "git repack -Ad". It makes A.pack, with stuff you want, and
>      B.pack, with unreachable junk. They both get a timestamp of "now".
> 
>   2. A day passes. You run "git repack -Ad" again. It makes C.pack, the
>      new stuff you want, and repacks all of B.pack along with the
>      new expired cruft from A.pack, making D.pack. B.pack can go away.
>      D.pack gets a timestamp of "now".

Hmm, yes.  What we'd really want to do is to make D.pack contain those
items that were are newly unreachable, not including the objects in
B.pack, and keep B.pack around until the expiry window goes by.  But
that's a much more complicated thing, and the proof-of-concept
algorithm I had outlined wouldn't do that.

> I think solving it for good would involve a separate list of per-object
> expiration dates. Obviously we get that easily with loose objects (since
> it is one object per file).

Well, either that or we need to teach git-repack the difference
between packs that are expected to contain good stuff, and packs that
contain cruft, and to not copy "old cruft" to new packs, so the old
pack can finally get nuked 2 weeks (or whatever the expire window
might happen to be) later.

					- Ted
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]