[PATCH 4/9] commit-graph: document bloom filter format

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



From: Garima Singh <garima.singh@xxxxxxxxxxxxx>

Update the technical documentation for commit-graph-format with BIDX
and BDAT chunk information.

RFC Notes:
1. [Call for advice] We specifically mention that we are using bloom
   filters in this technical document. Should this document also be
   made open to other data structures in the future, with versioning
   information?

2. [Call for advice] We are also not describing the explicit nature
   of how we store the bloom filter binary data. Would it be useful
   to document details about the hash algorithm, the number of hashes
   and the specific seed values we are using in a separate document,
   or perhaps in a separate section in this document?

Helped-by: Derrick Stolee <dstolee@xxxxxxxxxxxxx>
Signed-off-by: Garima Singh <garima.singh@xxxxxxxxxxxxx>
---
 Documentation/technical/commit-graph-format.txt | 17 +++++++++++++++++
 1 file changed, 17 insertions(+)

diff --git a/Documentation/technical/commit-graph-format.txt b/Documentation/technical/commit-graph-format.txt
index a4f17441ae..6497f19f08 100644
--- a/Documentation/technical/commit-graph-format.txt
+++ b/Documentation/technical/commit-graph-format.txt
@@ -17,6 +17,9 @@ metadata, including:
 - The parents of the commit, stored using positional references within
   the graph file.
 
+- The bloom filter of the commit carrying the paths that were changed between
+  the commit and it's first parent.
+
 These positional references are stored as unsigned 32-bit integers
 corresponding to the array position within the list of commit OIDs. Due
 to some special constants we use to track parents, we can store at most
@@ -93,6 +96,20 @@ CHUNK DATA:
       positions for the parents until reaching a value with the most-significant
       bit on. The other bits correspond to the position of the last parent.
 
+  Bloom Filter Index (ID: {'B', 'I', 'D', 'X'}) [Optional]
+      For each commit we store the offset of its bloom filter in the BDAT chunk
+      as follows:
+      BIDX[i] = number of 8-byte words in all the bloom filters from commit 0 to
+		commit i (inclusive)
+
+  Bloom Filter Data (ID: {'B', 'D', 'A', 'T'}) [Optional]
+      * It starts with three 32 bit integers for the
+	    - version of the hash algorithm being used
+	    - the number of hashes used in the computation
+	    - the number of bits per entry
+	  * The rest of the chunk is the concatenation of all the computed bloom 
+	  filters for the commits in lexicographic order.
+
   Base Graphs List (ID: {'B', 'A', 'S', 'E'}) [Optional]
       This list of H-byte hashes describe a set of B commit-graph files that
       form a commit-graph chain. The graph position for the ith commit in this
-- 
gitgitgadget




[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux