How to get rid of reachability races? This email is meant more as a basis for discussion (including at the GitMerge conference that is currently running) than as a fully-baked proposal. A lot of reachability-related races have been discussed recently: * If a reference is renamed while a prune is reading the loose references, it could be that prune sees neither the old nor the new version of the reference and could prune the objects referred to by the reference. * If a process is building a tree, it could find that some objects are already in the DB, in which case it will refer to the existing objects. But these objects might be unreachable, and could be pruned before the process gets around to creating a root reference that makes them reachable. * Other races involving packed refs vs. loose refs. These races are most critical for busy hosting sites like GitHub but could theoretically impact anybody. It has been discussed that we should offer, as an alternative to packed-refs/loose refs in the filesystem, a way to store refs in a single database store of some kind that offers ACID guarantees. This would solve the first and last kind of race in one step. Other mechanisms are being discussed to address the second race, maybe involving moving pruned objects into a kind of purgatory for a period of time before they are finally deleted. But I have the feeling that these races could also be solved via the use of the same database. I am thinking of a scheme in which the database stores two more things during prunes: 1. The names of objects that are being pruned. Such objects are considered deleted (even though they might not yet have been deleted from the object database) and no new references may be made to them. 2. The names of objects to which new links have been added. Prune is not allowed to delete this objects or objects to which they refer. like those used for incremental garbage collection of memory, in which *during* a garbage collection the VM keeps tracks of new references that are made to objects that are in the generation that is currently being garbage collected. The garbage collector has to consider these links in its reachability analysis. Our situation is a little bit more complicated, because it is not easy to inform a git process that a prune has been started. At best, the process can check occasionally whether a prune is running. On the other hand, it *is* permissible for Git processes to fail in exceptional circumstances, as long as it happens rarely and leaves the object store in a valid state. So I'm thinking of a scheme in which 1. prune has to list "doomed" objects to the database before deleting them. Such objects are considered deleted by other processes (*if* they know that a prune is running). 2. If a process knows that a prune is running, then it records to the database any new links that it makes to existing objects. Prune is not allowed to prune such objects. 3. To catch the situation that a process didn't know that a prune is running, we keep a prune generation number. If that number changes between the beginning/end of a process, then the process fails without recording any new references. Design goals: * Correctness * "Normal processes" don't block on each other or on prune. * Normal processes don't get significantly slower in the usual case (but are allowed to get slower during pruning). Assumptions: * There is a store with atomic transactions. * Most normal processes run more quickly than a prune. It will be rare for an entire prune to run while a normal process is running (though even if it does we must behave correctly). * Normal processes are allowed to fail occasionally as long as they leave the DB in a valid state. In the store, record the following values: * Git references -- (refname, object_name) pairs * prune_in_progress (boolean) -- a destructive operation is running * prune_generation (integer) -- incremented each time objects are invalidated * doomed_objects -- a list of the names of objects that a prune is in the process of deleting. These objects *are considered deleted* and may not be referred to. Only used when prune_in_progress. * new_links -- a list of the names of objects newly created or newly referenced during a prune. These objects are considered referenced and *may not be pruned*. Only used when prune_in_progress. A nondestructive operation will usually have two interactions with the store. At start:: def record_new_link(sha1): """Record that there is a new link to object sha1.""" if prune_in_progress: if sha1 in doomed_objects: die() append sha1 to new_links (RAM copy) with transaction: read prune_generation read prune_in_progress if prune_in_progress: read doomed_objects read any needed Git refs for each new object or new link to an old object: record_new_link(object_name) with transaction: if prune_generation has changed: die() if prune_in_progress: read current doomed_objects: if any of new_links are in doomed_objects: die() append list of new git references to new_links write changes to git references # From this moment on, the new_links are considered to be # reachable and neither they nor any objects that they refer to # may be pruned. A destructive operation will have at least three interactions with the store:: with transaction: if prune_in_progress: abort prune_in_progress = True clear doomed_objects list clear new_links list known_links = list of all object references doomed_objects = compute list of objects unreachable from known_links while True: with transaction: write current version of doomed_objects read new_links from store clear new_links in store # From this moment on, objects in doomed_objects are # considered deleted. if new_links is empty: break else: remove objects reachable from new_links from doomed_objects delete doomed objects from filesystem with transaction: clear list of doomed objects increment stored value of prune_generation prune_in_progress = False I hope that's clear; please poke holes in it. Michael -- Michael Haggerty mhagger@xxxxxxxxxxxx http://softwareswirl.blogspot.com/ -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html