Re: [PATCH v1] negotiator/default.c: avoid stack overflow

Derrick Stolee <derrickstolee@xxxxxxxxxx> · Mon, 24 Apr 2023 10:44:38 -0400

On 4/23/2023 10:23 PM, Han Xin wrote:
> mark_common() in negotiator/default.c may overflow the stack due to
> recursive function calls. Avoid this by instead recursing using a
> heap-allocated data structure.

I'm really happy to see that since you could replace the if statement
with a while statement, most of the existing logic could stay without
a bunch of whitespace changes.

> This is the same case as [1].
> 
> 1. https://lore.kernel.org/git/20221025232934.1504445-1-jonathantanmy@xxxxxxxxxx/

Thanks for the link, though this could be replaced with

  4654134976f (negotiator/skipping: avoid stack overflow, 2022-10-25)

now that the change exists in the commit history.

One thing that is missing from that change is a test, and such a test
could be generalized to apply to all negotiators. This could maybe
help any potential future negotiator avoid this bug. Did you think
about what such a test could look like? Perhaps test_commit_bulk
could help, but we'd probably need to create so many commits that the
test would need to be marked as expensive. That's probably a major
reason to not include a test and rely on avoiding recursion when
possible.

> -	if (commit != NULL && !(commit->object.flags & COMMON)) {
> +	struct prio_queue queue = { NULL };
> +
> +	prio_queue_put(&queue, commit);

Should we check the conditions what were removed? The COMMON flag
is likely only useful for the recursion, but prio_queue_put() is
not careful about NULL values. However, no callers should be
providing NULL commits here.

Couldn't hurt to add

	if (!commit)
		return;

before the prio_queue_put().

> +	while ((commit = prio_queue_get(&queue))) {
>  		struct object *o = (struct object *)commit;
>  
> +		if (commit == NULL || (commit->object.flags & COMMON))
> +			continue;

The NULL condition is definitely unnecessary here as it is checked
by the while condition. The "& COMMON" is helpful if the commit
gained the COMMON flag after being inserted into the queue.

>  		if (!ancestors_only)
>  			o->flags |= COMMON;
>  

> @@ -70,15 +76,17 @@ static void mark_common(struct negotiation_state *ns, struct commit *commit,
>  				ns->non_common_revs--;
>  			if (!o->parsed && !dont_parse)
>  				if (repo_parse_commit(the_repository, commit))
> -					return;
> +					continue;
>  
> +			ancestors_only = 0;

This caught me off guard, but this flag essentially says "should
I mark the first commit as common or not?". It would probably be
clearer if this was done before the loop, and then was ignored
within the loop, setting the flag on each parent in this loop:

>  			for (parents = commit->parents;
>  					parents;
>  					parents = parents->next)
> -				mark_common(ns, parents->item, 0,
> -					    dont_parse);
> +				prio_queue_put(&queue, parents->item);

It would have an extra benefit: your walk may duplicate objects in the
priority queue (there is no duplicate protection in prio_queue_put).
But, we could use

	if (!(parents->item->object.flags & COMMON)) {
		parents->item->object.flags |= COMMON;
		prio_queue_put(&queue, parents->item);
	}

as duplicate protection _and_ a clearer way to demonstrate what
ancestors_only is doing. Without this protection, it is possible
to have exponential growth in the priority queue using simple
merge commits.

You'd need this at the beginning:

	if (!commit)
		return;

	prio_queue_put(&queue, commit);
	if (!ancestors_only)
		commit->object.flags |= COMMON;
> diff --git a/negotiator/skipping.c b/negotiator/skipping.c
> index c7d6ab39bc..3d262b3533 100644
> --- a/negotiator/skipping.c
> +++ b/negotiator/skipping.c
> @@ -108,6 +108,8 @@ static void mark_common(struct data *data, struct commit *seen_commit)
>  				prio_queue_put(&queue, p->item);
>  		}
>  	}
> +
> +	clear_prio_queue(&queue);

This memory leak cleanup in the skipping negotiator is good to
do, but should be split into its own change.

In addition, the mark_common() method there seems to have a few
problems:

 1. It does not do duplicate protection before prio_queue_put().
    (The COMMON bit would work here, too.)
 2. When it translated from recursive to iterative it kept "return"
    statements that should probably be "continue" statements.
 3. It does not attempt to parse commits, and instead returns
    immediately when finding an unparsed commit. This is something
    that it did in its original version, so maybe it is by design,
    but it doesn't match the doc comment for the method.

Consider fixing these issues while you are here.

Thanks,
-Stolee