If we want to be able to accelerate lookups which contain 'uninteresting' revisions, we must be able to specify what those revisions are. Signed-off-by: Sam Vilain <sam@xxxxxxxxxx> --- Documentation/technical/revision-cache.txt | 85 +++++++++++++++++++++------- 1 files changed, 65 insertions(+), 20 deletions(-) diff --git a/Documentation/technical/revision-cache.txt b/Documentation/technical/revision-cache.txt index 8349cfe..759d78d 100644 --- a/Documentation/technical/revision-cache.txt +++ b/Documentation/technical/revision-cache.txt @@ -8,6 +8,9 @@ A revision cache contains; - A 'start' object (ie, 'interesting' object ID) + - A list of 'end' objects, which may be empty (ie, 'uninteresting' + object IDs) + - A list of objects which are referred to by the 'start' object * position when sorted in --date-order @@ -18,10 +21,14 @@ A revision cache contains; the above list -Start Object ------------- +Start Objects and End Object +---------------------------- + +The 'start' object, and 'end' objects are the identifying key of the +revision cache. -The 'start' object is the identifying key of the revision cache. +The 'end' objects must be reachable from the 'start' objects, and none +of the 'end' objects may be reachable from other 'end' objects. Topological contents list @@ -61,14 +68,35 @@ answer. Determining Cache Suitability - rev_cache_suitable() ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ -This is an auxiliary function used by the other use cases, when +This is an auxiliary function used by the other use cases. + +A revision cache is suitable whenever the walker encounters the single +object which is the 'end' object of the revision cache, and none of +the 'uninteresting' revisions to the walker are in the revision cache. The function is: - rev_cache_suitable( rev_cache, object ) + rev_cache_suitable( rev_cache, interesting, uninteresting [] ) + +The check is simple and fast - it first compares the 'interesting' +object to the 'start' object in the revision cache, then the +'uninteresting' objects that the walker is using are looked up in the +contents hash table. +Only if none of them are found is the revision cache suitable. + + +Revision cache stacking +^^^^^^^^^^^^^^^^^^^^^^^ + +If there are 'end' objects in the revision cache used, which do not +match the 'uninteresting' objects in the walker, the function may +recurse, looking for other revision caches which have the unwanted +'end' object as their 'start' object. -The check is simple and fast - it just compares the object to the -'start' object in the revision cache. +Stacking revision caches is a way to re-use an older index by just +generating a newer one which covers objects created after it. This +approach will be able to answer many questions, but not topological +ordering between objects which appeared in different caches. Revision Walker @@ -104,10 +132,12 @@ multiple 'interesting' objects were specified and 'ordered' is true, then the function returns false. If the ordering flag was set to false, then all of the 'interesting' -objects must be found in separate revision caches. +objects must be found in separate revision caches for the function to +return true. If any 'uninteresting' objects were passed, then the return value is -always false. +true if the suitability function passes for the revision caches which +are used. Returning objects @@ -118,7 +148,12 @@ list. rev_cache_fetch() : oid -If returning objects from a single revision cache, it opens the +If returning objects from a single or stacked set of revision caches, +it opens the caches and returns objects from them. If combining +results from multiple caches (where topological ordering of returned +objects is not important) then an in-memory object hash table must be +built, or the on-disk tables for all of the caches consulted along the +way. Accelerating in-progress walking @@ -142,7 +177,7 @@ an 'interesting' object added, then call: rev_cache_create( ) This function will not work, if multiple 'interesting' objects are -passed, or any 'uninteresting' objects were passed. +passed. This function must revision walk the commit graph, sorting in --date-order along the way, and may emit revisions as they are @@ -184,9 +219,12 @@ list. If multiple revision caches were used for multiple 'interesting' objects, then the number of objects can be obtained by building a hash table of all the objects in all of the caches, and returning the -number of hash table entries. +number of hash table entries. In the single-file stacked revision +cache case, the total can be found by adding up the total lengths of +the object lists. -If 'uninteresting' objects were passed, no cache can be suitable. +If 'uninteresting' objects were passed, the caches used must be +suitable according to the cache suitability function. Re-using deltas @@ -206,8 +244,9 @@ This implies that each delta base must be looked up in the on-disk hash table as they are written, which is both low impact and memory efficient. -For later fetches, the revision cache is not appropriate as they will -have 'uninteresting' objects set. +For later fetches, the revision cache will only be appropriate if the +'have' objects sent by the remote are not found in the revision +caches' object hash table. Re-using deltas - 'thin' pack @@ -217,8 +256,13 @@ This case is much like the normal re-using deltas case, except there is the complete set of objects that the remote has claimed to have to consider. -The cache is not yet sophisticated enough to assist with this use -case. +The revision cache can assist with this case only if the 'have' +objects sent by the remote match the 'uninteresting' objects in the +revision cache (or are not found in the object list), and another +revision cache exists which uses that 'uninteresting' object as its +'start' object. In this case, the lower stacked revision cache serves +as a handy lookup table as to whether the object is known to the +remote. 'shallow' clone considerations @@ -239,9 +283,10 @@ The cache is not yet sophisticated enough to assist with this case. creating bundles ~~~~~~~~~~~~~~~~ -So long as a bundle has no 'uninteresting' commits, then the revision -cache is completely appropriate; with the length of the object list it -can write a header and continue with the 'pack-objects' use case. +So long as a bundle's 'uninteresting' commits match that of the +revision cache, then the revision cache is completely appropriate; +with the length of the object list it can write a header and continue +with the 'pack-objects' use case. slicing bundles (mirror-sync) -- debian.1.5.6.1 -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html