On Sat, Aug 22, 2020 at 07:14:38PM +0200, Jakub Narębski wrote: > Hi Abhishek, > > ... > > However the commit message do not say anything about the *writing* side. > Revised the commit message to include the following at the end: When writing the new layer in split commit-graph, we write a GDAT chunk only if the topmost layer has a GDAT chunk. This guarantees that if a layer has GDAT chunk, all lower layers must have a GDAT chunk as well. Rewriting layers follows similar approach: if the topmost layer below set of layers being rewritten (in the split commit-graph chain) exists, and it does not contain GDAT chunk, then the result of rewrite does not have GDAT chunks either. > > ... > > To be more detailed, without '--split=replace' we would want the following > layer merging behavior: > > [layer with GDAT][with GDAT][without GDAT][without GDAT][without GDAT] > 1 2 3 4 5 > > In the split commit-graph chain above, merging two topmost layers > (layers 4 and 5) should create a layer without GDAT; merging three > topmost layers (and any other layers, e.g. two middle ones, i.e. 3 and > 4) should create a new layer with GDAT. > > [layer with GDAT][with GDAT][without GDAT][-------without GDAT-------] > 1 2 3 merged > > [layer with GDAT][with GDAT][-------------with GDAT------------------] > 1 2 merged > > I hope those ASCII-art pictures help understanding it > Thanks! There were helpful. While we work as expected in the first scenario i.e merging 4 and 5, we would *still* write a layer without GDAT in the second scenario. I have tweaked split_graph_merge_strategy() to fix this: ---------------------------------------------- diff --git a/commit-graph.c b/commit-graph.c index 6d54d9a286..246fad030d 100644 --- a/commit-graph.c +++ b/commit-graph.c @@ -1973,6 +1973,9 @@ static void split_graph_merge_strategy(struct write_commit_graph_context *ctx) } } + if (!ctx->write_generation_data && g->chunk_generation_data) + ctx->write_generation_data = 1; + if (flags != COMMIT_GRAPH_SPLIT_REPLACE) ctx->new_base_graph = g; else if (ctx->num_commit_graphs_after != 1) ---------------------------------------------------- That is, if we were not writing generation data (because of mixed generation concerns) but the new topmost layer has a generation data chunk, we have merged all layers without GDAT chunk and can now write a GDAT chunk safely. > > > > It is difficult to expose this issue in a test. Since we _start_ with > > artificially low generation numbers, any commit walk that prioritizes > > generation numbers will walk all of the commits with high generation > > number before walking the commits with low generation number. In all the > > cases I tried, the commit-graph layers themselves "protect" any > > incorrect behavior since none of the commits in the lower layer can > > reach the commits in the upper layer. > > > > This issue would manifest itself as a performance problem in this case, > > especially with something like "git log --graph" since the low > > generation numbers would cause the in-degree queue to walk all of the > > commits in the lower layer before allowing the topo-order queue to write > > anything to output (depending on the size of the upper layer). > > Wouldn't breaking the reachability condition promise make some Git > commands to return *incorrect* results if they short-circuit, stop > walking if generation number shows that A cannot reach B? > > I am talking here about commands that return boolean, or select subset > from given set of revisions: > - git merge-base --is-ancestor <B> <A> > - git branch branch-A <A> && git branch --contains <B> > - git branch branch-B <B> && git branch --merged <A> > > Git assumes that generation numbers fulfill the following condition: > > if A can reach B, then gen(A) > gen(B) > > Notably this includes commits not in commit-graph, and clamped values. > > However, in the following case > > * if commit A is from higher layer without GDAT > and uses topological levels for 'generation', e.g. 115 (in a small repo) > * and commit B is from lower layer with GDAT > and uses corrected commit date as 'generation', for example 1598112896, > > it may happen that A (later commit) can reach B (earlier commit), but > gen(B) > gen(A). The reachability condition promise for generation > numbers is broken. > > > > > Signed-off-by: Derrick Stolee <dstolee@xxxxxxxxxxxxx> > > Signed-off-by: Abhishek Kumar <abhishekkumar8222@xxxxxxxxx> > > --- > > I have reordered files in the patch itself to make it easier to review > the proposed changes. > > > commit-graph.h | 1 + > > commit-graph.c | 32 +++++++++++++++- > > t/t5324-split-commit-graph.sh | 70 +++++++++++++++++++++++++++++++++++ > > 3 files changed, 102 insertions(+), 1 deletion(-) > > > > diff --git a/commit-graph.h b/commit-graph.h > > index f78c892fc0..3cf89d895d 100644 > > --- a/commit-graph.h > > +++ b/commit-graph.h > > @@ -63,6 +63,7 @@ struct commit_graph { > > struct object_directory *odb; > > > > uint32_t num_commits_in_base; > > + uint32_t read_generation_data; > > struct commit_graph *base_graph; > > > > First, why `read_generation_data` is of uint32_t type, when it stores > (as far as I understand it), a "boolean" value of either 0 or 1? > Yes, using unsigned int instead of uint32_t (although in most of cases it would be same). If commit_graph had other flags as well, we could have used a bit field. > Second, couldn't we simply set chunk_generation_data to NULL? Or would > that interfere with the case of rewriting, where we want to use existing > GDAT data when writing new commit-graph with GDAT chunk? It interferes with rewriting the split commit-graph, as you might have guessed from the above code snippet. > > ... > > > diff --git a/commit-graph.c b/commit-graph.c > > > graph_data->generation = item->date + > > (timestamp_t) get_be32(g->chunk_generation_data + sizeof(uint32_t) * lex_index); > > else > > @@ -885,6 +908,7 @@ void load_commit_graph_info(struct repository *r, struct commit *item) > > uint32_t pos; > > if (!prepare_commit_graph(r)) > > return; > > + > > if (find_commit_in_graph(item, r->objects->commit_graph, &pos)) > > fill_commit_graph_info(item, r->objects->commit_graph, pos); > > } > > This is unrelated whitespace fix, a "while at it" in neighbourhood of > changes. All right then. > Reverted this change, as it's unimportant. > > @@ -2192,6 +2216,9 @@ int write_commit_graph(struct object_directory *odb, > > ... > > It would be nice to have an example with merging layers (whether we > would handle it in strict or relaxed way). > Sure, will add. > > + > > test_done > > Best, > -- > Jakub Narębski Thanks - Abhishek