On Mon, Apr 24, 2023 at 10:44 PM Derrick Stolee <derrickstolee@xxxxxxxxxx> wrote: > > > This is the same case as [1]. > > > > 1. https://lore.kernel.org/git/20221025232934.1504445-1-jonathantanmy@xxxxxxxxxx/ > > Thanks for the link, though this could be replaced with > > 4654134976f (negotiator/skipping: avoid stack overflow, 2022-10-25) > > now that the change exists in the commit history. make sense. > > One thing that is missing from that change is a test, and such a test > could be generalized to apply to all negotiators. This could maybe > help any potential future negotiator avoid this bug. Did you think > about what such a test could look like? Perhaps test_commit_bulk > could help, but we'd probably need to create so many commits that the > test would need to be marked as expensive. That's probably a major > reason to not include a test and rely on avoiding recursion when > possible. I first found this issue in a large repository with numerous merge commits. To address it, I added a test case which fast-imports 10,000 commits and runs them through run_with_limited_stack(). Although expensive, this approach was successful in executing the test case without any issues. > > > - if (commit != NULL && !(commit->object.flags & COMMON)) { > > + struct prio_queue queue = { NULL }; > > + > > + prio_queue_put(&queue, commit); > > Should we check the conditions what were removed? The COMMON flag > is likely only useful for the recursion, but prio_queue_put() is > not careful about NULL values. However, no callers should be > providing NULL commits here. > > Couldn't hurt to add > > if (!commit) > return; make sense. > > before the prio_queue_put(). > > > + while ((commit = prio_queue_get(&queue))) { > > struct object *o = (struct object *)commit; > > > > + if (commit == NULL || (commit->object.flags & COMMON)) > > + continue; > > The NULL condition is definitely unnecessary here as it is checked > by the while condition. The "& COMMON" is helpful if the commit > gained the COMMON flag after being inserted into the queue. > > > if (!ancestors_only) > > o->flags |= COMMON; > > > > > > @@ -70,15 +76,17 @@ static void mark_common(struct negotiation_state *ns, struct commit *commit, > > ns->non_common_revs--; > > if (!o->parsed && !dont_parse) > > if (repo_parse_commit(the_repository, commit)) > > - return; > > + continue; > > > > + ancestors_only = 0; > > This caught me off guard, but this flag essentially says "should > I mark the first commit as common or not?". It would probably be > clearer if this was done before the loop, and then was ignored > within the loop, setting the flag on each parent in this loop: > > > for (parents = commit->parents; > > parents; > > parents = parents->next) > > - mark_common(ns, parents->item, 0, > > - dont_parse); > > + prio_queue_put(&queue, parents->item); > I'll think about how to optimize this again. ancestors_only is used multiple times in the original logic: 1. if (!ancestors_only) o->flags |= COMMON; 2. if (!(o->flags & SEEN)) rev_list_push(ns, commit, SEEN); else { struct commit_list *parents; if (!ancestors_only && !(o->flags & POPPED)) ns->non_common_revs--; Should we use this ? if (!ancestors_only) { commit->object.flags |= COMMON; if ((commit->object.flags & SEEN) && !(commit->object.flags & POPPED)) ns->non_common_revs--; } and for (parents = commit->parents; parents; parents = parents->next) { if (parents->item->object.flags & COMMON) continue; parents->item->object.flags |= COMMON; if ((parents->item->object.flags & SEEN) && !(parents->item->object.flags & POPPED)) ns->non_common_revs--; prio_queue_put(&queue, parents->item); } > It would have an extra benefit: your walk may duplicate objects in the > priority queue (there is no duplicate protection in prio_queue_put). > But, we could use > > if (!(parents->item->object.flags & COMMON)) { > parents->item->object.flags |= COMMON; > prio_queue_put(&queue, parents->item); > } > > as duplicate protection _and_ a clearer way to demonstrate what > ancestors_only is doing. Without this protection, it is possible > to have exponential growth in the priority queue using simple > merge commits. > > You'd need this at the beginning: > > if (!commit) > return; > > prio_queue_put(&queue, commit); > if (!ancestors_only) > commit->object.flags |= COMMON; Make sense. > > diff --git a/negotiator/skipping.c b/negotiator/skipping.c > > index c7d6ab39bc..3d262b3533 100644 > > --- a/negotiator/skipping.c > > +++ b/negotiator/skipping.c > > @@ -108,6 +108,8 @@ static void mark_common(struct data *data, struct commit *seen_commit) > > prio_queue_put(&queue, p->item); > > } > > } > > + > > + clear_prio_queue(&queue); > > This memory leak cleanup in the skipping negotiator is good to > do, but should be split into its own change. > > In addition, the mark_common() method there seems to have a few > problems: > > 1. It does not do duplicate protection before prio_queue_put(). > (The COMMON bit would work here, too.) > 2. When it translated from recursive to iterative it kept "return" > statements that should probably be "continue" statements. > 3. It does not attempt to parse commits, and instead returns > immediately when finding an unparsed commit. This is something > that it did in its original version, so maybe it is by design, > but it doesn't match the doc comment for the method. > > Consider fixing these issues while you are here. > Make sense. Thanks. -Han Xin