Re: [PATCH 13/16] prune: keep objects reachable from recent objects

Junio C Hamano <gitster@xxxxxxxxx> · Fri, 03 Oct 2014 14:47:57 -0700

Jeff King <peff@xxxxxxxx> writes:

> Instead, this patch pushes the extra work onto prune, which
> runs less frequently (and has to look at the whole object
> graph anyway). It creates a new category of objects: objects
> which are not recent, but which are reachable from a recent
> object. We do not prune these objects, just like the
> reachable and recent ones.
>
> This lets us avoid the recursive check above, because if we
> have an object, even if it is unreachable, we should have
> its referent:
>
>   - if we are creating new objects, then we cannot create
>     the parent object without having the child
>
>   - and if we are pruning objects, will not prune the child
>     if we are keeping the parent

Sorry but this part is beyond a simple panda brain.

I can understand this

	If we have an object, even if it is unreachable, we
	should have its referent.

as a description of the desired behaviour.  If we have a tree that
is unreachable, we must make sure that we do not discard blobs that
are reachable from that tree, or we would end up corrupting our
repository if we ever allow that tree to become reachable from our
refs later.

But how does that connect to these two bullet points?

>   - if we are creating new objects, then we cannot create
>     the parent object without having the child

We cannot create the parent (e.g. "tree") without having the child
(e.g. "blob that is referred to by the tree we are creating").
So this bullet point is repeating the same thing?

>   - and if we are pruning objects, will not prune the child
>     if we are keeping the parent

We will not prune "blob" that are reachable from a "tree" that we
are not yet ready to prune.  So this again is repeating the same
thing?

But these are "this is how we want our system to behave".  And if we
assume our system behaves like so, then prune would be safe.

But it is unclear how that behaviour is realized.  Puzzled...

... goes and thinks ...

With this patch applied, the system will not prune unreachable old
objects that are reachable from a recent object (the recent object
itself may or may not be reachable but that does not make any
difference).  And that is sufficient to ensure the integrity of the
repository even if you allow new objects to be created reusing any
of these unreachable objects that are left behind by prune, because
the reachability check done during prune (with this patch applied)
makes sure any object left in the repository can safely be used as a
starting point of connectivity traversal.

Ok, I think I got it now, but then do we still need to utime(2) the
loose object files for unreachable objects that are referenced by
a recent object (which is done in a later patch), or is that purely
an optimization for the next round of gc where you would have more
recent objects (i.e. you do not have to traverse to find out an old
one is reachable from a new one, as there will be fewer old ones)?

--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html