On 4/10/2018 3:10 PM, Stefan Beller wrote:
Hi Derrick,
On Tue, Apr 10, 2018 at 5:55 AM, Derrick Stolee <stolee@xxxxxxxxx> wrote:
+ OID Fanout (ID: {'O', 'I', 'D', 'F'}) (256 * 4 bytes)
+ The ith entry, F[i], stores the number of OIDs with first
+ byte at most i. Thus F[255] stores the total
+ number of commits (N).
I was about to give this series one last read not expecting any questions
to come up (this series has had a lot of feedback already!)
Although I just did.
What were your design considerations for the fanout table?
Did you include it as the pack index has one or did you come up with
them from first principles?
Have you measured the performance impact of the fanout table
(maybe even depending on the size of the fanout) ?
context:
https://public-inbox.org/git/CAJo=hJsto1ik=GTC8c3+2_jBuUqcAPL0UWp-1uoYYMpgbLB+qg@xxxxxxxxxxxxxx/
(side note: searching the web for fanout makes it seem
as if it is git-lingo, apparently the term is not widely used)
I don't think we want to restart the design discussion,
I am just curious.
I knew that I wanted some amount of a fanout table, and the 256-entry
one was used for IDX files (and in my MIDX RFC). With the recent
addition of "packfile: refactor hash search with fanout table" [1] it is
probably best to keep the 256-entry table to reduce code clones.
As for speed, we have the notion of 'graph_pos' which gives random
access into the commit-graph after a commit is loaded as a parent of a
commit from the commit-graph file. Thus, we are spending time in the
binary search only for commits that do not exist in the commit-graph
file and those that are first found in the file. Thus, running profilers
on long commit-graph walks do not show any measurable time spent in
'bsearch_graph()'.
Thanks,
-Stolee
[1]
https://github.com/gitster/git/commit/b4e00f7306a160639f047b3421985e8f3d0c6fb1