Jeff King <peff@xxxxxxxx> writes: > Instead, this patch pushes the extra work onto prune, which > runs less frequently (and has to look at the whole object > graph anyway). It creates a new category of objects: objects > which are not recent, but which are reachable from a recent > object. We do not prune these objects, just like the > reachable and recent ones. > > This lets us avoid the recursive check above, because if we > have an object, even if it is unreachable, we should have > its referent: > > - if we are creating new objects, then we cannot create > the parent object without having the child > > - and if we are pruning objects, will not prune the child > if we are keeping the parent Sorry but this part is beyond a simple panda brain. I can understand this If we have an object, even if it is unreachable, we should have its referent. as a description of the desired behaviour. If we have a tree that is unreachable, we must make sure that we do not discard blobs that are reachable from that tree, or we would end up corrupting our repository if we ever allow that tree to become reachable from our refs later. But how does that connect to these two bullet points? > - if we are creating new objects, then we cannot create > the parent object without having the child We cannot create the parent (e.g. "tree") without having the child (e.g. "blob that is referred to by the tree we are creating"). So this bullet point is repeating the same thing? > - and if we are pruning objects, will not prune the child > if we are keeping the parent We will not prune "blob" that are reachable from a "tree" that we are not yet ready to prune. So this again is repeating the same thing? But these are "this is how we want our system to behave". And if we assume our system behaves like so, then prune would be safe. But it is unclear how that behaviour is realized. Puzzled... ... goes and thinks ... With this patch applied, the system will not prune unreachable old objects that are reachable from a recent object (the recent object itself may or may not be reachable but that does not make any difference). And that is sufficient to ensure the integrity of the repository even if you allow new objects to be created reusing any of these unreachable objects that are left behind by prune, because the reachability check done during prune (with this patch applied) makes sure any object left in the repository can safely be used as a starting point of connectivity traversal. Ok, I think I got it now, but then do we still need to utime(2) the loose object files for unreachable objects that are referenced by a recent object (which is done in a later patch), or is that purely an optimization for the next round of gc where you would have more recent objects (i.e. you do not have to traverse to find out an old one is reachable from a new one, as there will be fewer old ones)? -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html