[PATCH 2/7] rev-cache: add on-disk format for fast reachability lookup

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



As well as storing the sorted list of objects, store a hash table for
faster lookup.

Signed-off-by: Sam Vilain <sam@xxxxxxxxxx>
---
 Documentation/technical/revision-cache.txt |   24 ++++++++++++++++++++----
 1 files changed, 20 insertions(+), 4 deletions(-)

diff --git a/Documentation/technical/revision-cache.txt b/Documentation/technical/revision-cache.txt
index cc18535..8349cfe 100644
--- a/Documentation/technical/revision-cache.txt
+++ b/Documentation/technical/revision-cache.txt
@@ -14,6 +14,9 @@ A revision cache contains;
 
     * object ID
 
+  - A hash table from an (abbreviated) object ID to a position into
+    the above list
+
 
 Start Object
 ------------
@@ -35,6 +38,18 @@ objects are sorted as if they were commit objects with a single
 parent, the object they tag.
 
 
+Included object hash table
+--------------------------
+
+This index is used to quickly determine if an object exists in the
+index without scanning the entire topological list.
+
+Entries in the object hash table can be shortened, eg to 3 or 4 bytes;
+basically they just need to be long enough to avoid collisions within
+the objects which exist in the list.  Any match must be confirmed by
+checking the full SHA1 in the topological list.
+
+
 Use Cases
 ---------
 In this section, the key functions and operations that this index is
@@ -131,7 +146,8 @@ passed, or any 'uninteresting' objects were passed.
 
 This function must revision walk the commit graph, sorting in
 --date-order along the way, and may emit revisions as they are
-discovered to the topological object list.
+discovered to the topological object list.  It must also build a hash
+table of object IDs and emit it at the end.
 
 
 receive-pack/pack-objects
@@ -186,9 +202,9 @@ emitted, the delta from the packfile is re-used.  If a loop is
 detected or the delta base is not in the returned set of objects, then
 the delta is first resolved.
 
-This implies that the list of objects is first loaded into a hash
-table prior to returning any objects; however this is probably
-acceptable as the entire list is in one stream and will load quickly.
+This implies that each delta base must be looked up in the on-disk
+hash table as they are written, which is both low impact and memory
+efficient.
 
 For later fetches, the revision cache is not appropriate as they will
 have 'uninteresting' objects set.
-- 
debian.1.5.6.1

--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]