Re: [PATCH] enable core.fsyncObjectFiles by default

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Mon, Jan 22, 2018 at 01:09:03PM -0500, Theodore Ts'o wrote:

> > Wouldn't it also make gc pruning more expensive? Now you can repack
> > regularly and loose objects will be left out of the pack, and then just
> > rm'd, whereas now it would entail creating new packs (unless the whole
> > pack was objects meant for removal).
> 
> The idea is that the cruft pack would be all objects that were no
> longer referenced.  Hence the proposal that if they ever *are*
> accessed, they would be exploded to a loose object at that point.  So
> in the common case, the GC would go quickly since the entire pack
> could just be rm'ed once it hit the designated expiry time.

I think Ævar is talking about the case of:

  1. You make 100 objects that aren't referenced. They're loose.

  2. You run git-gc. They're still too recent to be deleted.

Right now those recent loose objects sit loose, and have zero cost at
the time of gc.  In a "cruft pack" world, you'd pay some I/O to copy
them into the cruft pack, and some CPU to zlib and delta-compress them.
I think that's probably fine, though.

That said, some of what you wrote left me confused, and whether we're
all talking about the same idea. ;) Let me describe the idea I had
mentioned in another thread.  Right now the behavior is basically this:

If an unreachable object becomes referenced, it doesn't immediately get
exploded. During the next gc, whatever new object referenced them would
be one of:

  1. Reachable from refs, in which case it carries along the
     formerly-cruft object into the new pack, since it is now also
     reachable.

  2. Unreachable but still recent by mtime; we keep such objects, and
     anything they reference (now as unreachable, in this proposal in
     the cruft pack). Now these get either left loose, or exploded loose
     if they were previously packed.

  3. Unreachable and old. Both objects can be dropped totally.

The current strategy is to use the mtimes for "recent", and we use the
pack's mtime for every object in the pack.

So if we pack all the loose objects into a cruft pack, the mtime of the
cruft pack becomes the new gauge for "recent". And if we migrate objects
from old cruft pack to new cruft pack at each gc, then they'll keep
getting their mtimes refreshed, and we'll never drop them.

So we need to either:

  - keep per-object mtimes, so that old ones can age out (i.e., they'd
    hit case 3 and just not get migrated to either the new "real" pack
    or the new cruft pack).

  - keep multiple cruft packs, and let whole packs age out. But then
    cruft objects which get referenced again by other cruft have to get
    copied (not moved!) to new packs. That _probably_ doesn't happen all
    that often, so it might be OK.

> Another way of doing things would be to use the mtime of the cruft
> pack for the expiry time, and if the curft pack is ever referenced,
> its mtime would get updated.  Yet a third way would be to simply clear
> the "cruft" bit if it ever *is* referenced.  In the common case, it
> would never be referenced, so it could just get deleted, but in the
> case where the user has manually "rescued" a set of commits (perhaps
> by explicitly setting a branch head to commit id found from a reflog),
> the objects would be saved.

I don't think we have to worry about "rescued" objects. Those are
reachable, so they'd get copied into the new "real" pack (and then their
cruft pack eventually deleted).

-Peff



[Index of Archives]     [Linux Ext4 Filesystem]     [Union Filesystem]     [Filesystem Testing]     [Ceph Users]     [Ecryptfs]     [AutoFS]     [Kernel Newbies]     [Share Photos]     [Security]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux Cachefs]     [Reiser Filesystem]     [Linux RAID]     [Samba]     [Device Mapper]     [CEPH Development]
  Powered by Linux