On 5/22/2019 8:43 PM, Ævar Arnfjörð Bjarmason wrote: > > On Wed, May 22 2019, Derrick Stolee via GitGitGadget wrote: > >> To keep lookups fast, but also keep most incremental writes fast, create >> a strategy for merging levels of the commit-graph chain. The strategy is >> detailed in the commit-graph design document, but is summarized by these >> two conditions: >> >> 1. If the number of commits we are adding is more than half the number >> of commits in the graph below, then merge with that graph. >> >> 2. If we are writing more than 64,000 commits into a single graph, >> then merge with all lower graphs. >> >> The numeric values in the conditions above are currently constant, but >> can become config options in a future update. >> [...] >> +## Merge Strategy >> + >> +When writing a set of commits that do not exist in the commit-graph stack of >> +height N, we default to creating a new file at level N + 1. We then decide to >> +merge with the Nth level if one of two conditions hold: >> + >> + 1. The expected file size for level N + 1 is at least half the file size for >> + level N. >> + >> + 2. Level N + 1 contains more than MAX_SPLIT_COMMITS commits (64,0000 >> + commits). >> + >> +This decision cascades down the levels: when we merge a level we create a new >> +set of commits that then compares to the next level. >> + >> +The first condition bounds the number of levels to be logarithmic in the total >> +number of commits. The second condition bounds the total number of commits in >> +a `graph-{hashN}` file and not in the `commit-graph` file, preventing >> +significant performance issues when the stack merges and another process only >> +partially reads the previous stack. >> + >> +The merge strategy values (2 for the size multiple, 64,000 for the maximum >> +number of commits) could be extracted into config settings for full >> +flexibility. > > As noted this can become configurable, so it's no big deal. But is there > any reason for ths 64K limit anymore? There may not be an important reason to include it by default. Whatever config option we use could have special values: * -1: No maximum commit limit (default) * 0: Never write more than one level. I would personally set a limit somewhere around 64,000 to prevent long chains and having some processes hit a concurrency issue where their run-time commit-graph is effectively 100,000 commits behind. > While with the default expiry of 0sec we can still get that race, it > seems unlikely in practice, as the "commit-graph write" process would > write a new manifest at the end, then go and unlink() the old files. The process reading an old manifest can still only succeed partially as it reads the chain from bottom-to-top. Succeeding gracefully in this case is built-in, but should have a test! (noted) But I don't think the concurrency window is any more lenient than before. We now have the flexibility to set a non-zero expiry window. That was impossible in the other model. > So maybe at this point we could make this even dumber with something > that behaves like gc.autoPackLimit? I.e. keep writing new graphs, and > then coalesce them all (or maybe not the "base" graph, like > gc.bigPackThreshold)? I will look into these existing settings and try to include similar options here. Better to stay consistent when we can. > Also: These docs refer to MAX_SPLIT_COMMITS, but in v2 it's now a > "split_strategy_max_commits" variable instead. Thanks for catching my out-of-date documentation. -Stolee