Re: [PATCH 1/6] Documentation/technical: describe bitmap lookup table extension

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 6/20/2022 8:33 AM, Abhradeep Chakraborty via GitGitGadget wrote:
> From: Abhradeep Chakraborty <chakrabortyabhradeep79@xxxxxxxxx>
> 
> When reading bitmap file, git loads each and every bitmap one by one
> even if all the bitmaps are not required. A "bitmap lookup table"
> extension to the bitmap format can reduce the overhead of loading
> bitmaps which stores a list of bitmapped commit oids, along with their
> offset and xor offset. This way git can load only the neccesary bitmaps
> without loading the previous bitmaps.
> 
> Add some information for the new "bitmap lookup table" extension in the
> bitmap-format documentation.


> @@ -67,6 +67,14 @@ MIDXs, both the bit-cache and rev-cache extensions are required.
>  			pack/MIDX. The format and meaning of the name-hash is
>  			described below.
>  
> +			** {empty}
> +			BITMAP_OPT_LOOKUP_TABLE (0xf) : :::
> +			If present, the end of the bitmap file contains a table
> +			containing a list of `N` object ids, a list of pairs of
> +			offset and xor offset of respective objects, and 4-byte
> +			integer denoting the flags (currently none). The format
> +			and meaning of the table is described below.
> +

Here, you are adding a new flag that indicates that the end of the file
contains this extra extension. This works because the size of the
extension is predictable. As long as any future extensions are also of
a predictable size, then we can continue adding them via flags in this
way.

This is better than updating the full file format to do something like
like use the chunk format API, especially because this format is shared
across other tools (JGit being mentioned frequently).

It might be worth mentioning in your commit message what happens when an
older version of Git (or JGit) notices this flag. Does it refuse to
operate on the .bitmap file? Does it give a warning or die? It would be
nice if this extension could be ignored (it seems like adding the extra
data at the end does not stop the bitmap data from being understood).

> +
> +Commit lookup table
> +-------------------
> +
> +If the BITMAP_OPT_LOOKUP_TABLE flag is set, the end of the `.bitmap`
> +contains a lookup table specifying the positions of commits which have a
> +bitmap.

Perhaps it would be better to say "the last N * (HASH_LEN + 8) + 4 bytes
preceding the trailing hash" or something? This gives us a concrete way
to compute the start of the table, while also being clear that the table
is included in the trailing hash.

> +For a `.bitmap` containing `nr_entries` reachability bitmaps, the format
> +is as follows:
> +
> +	- `nr_entries` object names.

Could you expand that these objects are commit OIDs, one for each bitmap
in the file. Are they sorted in lexicographical order for binary search,
or are we expecting to read the entire table into a hashtable in-memory?

> +	- `nr_entries` pairs of 4-byte integers, each in network order.
> +	  The first holds the offset from which that commit's bitmap can
> +	  be read. The second number holds the position of the commit
> +	  whose bitmap the current bitmap is xor'd with in lexicographic
> +	  order, or 0xffffffff if the current commit is not xor'd with
> +	  anything.

Interesting to give the xor chains directions here. You say "position"
here for the second commit: do you mean within the list of object names
as opposed to the offset? That would make the most sense so we can trace
the full list of XORs we need to make all at once.

Are .bitmap files already constrained to 4GB, so these 32-bit offsets
make sense? Using 64-bit offsets would be a small cost here, I think,
without needing to do any fancy "overflow" tables that could introduce
a variable-length extension.

> +	- One 4-byte network byte order integer specifying
> +	  table-specific flags. None exist currently, so this is always
> +	  "0".

I'm guessing this is at the end of the extension because a future flag
could modify the length of the extension, so we need the flags to be
in a predictable location. Could we make that clear somewhere?

How does Git react to seeing flags here that it does not recognize?
It seems that Git should ignore the lookup table but continue using the
rest of the .bitmap file as it did before, yes?

Thanks,
-Stolee



[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux