Re: Keeping unreachable objects in a separate pack instead of loose?

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Mon, Jun 11, 2012 at 05:39:48PM -0400, Jeff King wrote:
> 
> Yeah. It doesn't eliminate duplicates, but that may not be worth caring
> about. I find the "cruft" marking a little hacky, because it is only
> "objects in here _may_ be cruft", but as long as that is understood, it
> is OK (and it is understood in the sequence above; "repack -Ad" is safe
> because it knows that it would have repacked any non-cruft).

Well, all the objects in the file *were* cruft at the time that it was
created.  And the reason why we are keeping them around is in case we
were wrong about their being cruft, so I guess I don't have that much
trouble with the name.  Something like "KillShelter" (as in the
opposite of No-Kill Animal Shelters) would be more discriptive, but I
think it's a bit lacking in taste....

> > It does imply that we may accumulate a new cruft-<SHA1> pack each time
> > we run git gc, but users shouldn't be running git gc all that often
> > anyway.  And even if they do run it all the time, it will still be
> > more efficient than keeping the unreachable objects as loose objects.
> 
> Yeah, it would be nice to keep it all in a single pack, but that means
> doing the I/O on rewriting the cruft packs each time. And figuring out
> some way of handling the mtime in such a way that we don't keep
> refreshing the age during each gc.

Well, I'd like to avoid doing the I/O because I want to minimize wear
on SSD drives; and given that it's unlikely that the cruft packs will
be referenced, the fact that we have a bunch of cruft packs shouldn't
be a big deal, especially if we teach git to search the cruft packs
last.

> Speaking of which, what is the mtime of the newly created cruft pack? Is
> it the current mtime? Then those unreachable objects will stick for
> another 2 weeks, instead of being back-dated to their pack's date. You
> could back-date to the mtime of the most recent deleted pack, but that
> would still prolong the life of objects from the older packs. It may be
> acceptable to just ignore the issue, though; they will expire
> eventually.

Well, we have that problem today when "git pack-objects
--unpack-unreachable" explodes unreferenced objects --- they are
written with the current mtime.  I assume you're worried about
pre-existing loose objects that get collected up into a new cruft
pack, since they would get the extra two weeks of life.  Given how
much more efficient storing the cruft objects in a pack, I think
ignoring the issue is what makes the most amount of sense, since it's
a one-time extension, and the extra objects really won't do any harm.

One last thought: if a sysadmin is really hard up for space, (and if
the cruft objects include some really big sound or video files) one
advantage of labelling the cruft packs explicitly is that someone who
really needs the space could potentially find the oldest cruft files
and delete them, since they would be tagged for easy findability.

    	   	       	    	     	    - Ted
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]