Re: [PATCH 08/15] name-rev: pull out deref handling from the recursion

René Scharfe <l.s.r@xxxxxx> · Sat, 21 Sep 2019 14:37:18 +0200

Am 21.09.19 um 11:57 schrieb SZEDER Gábor:
> On Fri, Sep 20, 2019 at 08:14:07PM +0200, SZEDER Gábor wrote:
>> On Fri, Sep 20, 2019 at 08:13:02PM +0200, SZEDER Gábor wrote:
>>>> If the (re)introduced leak doesn't impact performance and memory
>>>> usage too much then duplicating tip_name again in name_rev() or
>>>> rather your new create_or_update_name() would likely make the
>>>> lifetimes of those string buffers easier to manage.
>>>
>>> Yeah, the easiest would be when each 'struct rev_name' in the commit
>>> slab would have its own 'tip_name' string, but that would result in
>>> a lot of duplicated strings and increased memory usage.
>>
>> I didn't measure how much more memory would be used, though.
>
> So, I tried the patch below to give each 'struct rev_name' instance
> its own copy of 'tip_name', and the memory usage of 'git name-rev
> --all' usually increased.
>
> The increase depends on how many merges and how many refs there are
> compared to the number of commits: the fewer merges and refs, the
> higher the more the memory usage increased:
>
>   linux:         +4.8%
>   gcc:           +7.2%
>   gecko-dev:     +9.2%
>   webkit:       +12.4%
>   llvm-project: +19.0%

Is that the overall memory usage or just for struct rev_name instances
and tip_name strings?  And how much is that in absolute terms?  (Perhaps
it's worth it to get the memory ownership question off the table at
least during the transformation to iterative processing.)

> git.git is the exception with its unusually high number of merge
> commits (about 25%), and the memory usage decresed by 4.4%.

Interesting.

I wonder why regular commits even need a struct name_rev.  Shouldn't
only tips and roots need ones?  And perhaps merges and occasional
regular "checkpoint" commits, to avoid too many duplicate traversals.

That's not exactly on-topic, though, and I didn't think all that
deeply about it, but perhaps switching to a different marking
strategy could get rid of recursion as a side-effect?  *waves hands
vaguely*

>
>
>  --- >8 ---
>
> diff --git a/builtin/name-rev.c b/builtin/name-rev.c
> index 6969af76c4..62ab78242b 100644
> --- a/builtin/name-rev.c
> +++ b/builtin/name-rev.c
> @@ -88,6 +88,7 @@ static struct rev_name *create_or_update_name(struct commit *commit,
>  		set_commit_rev_name(commit, name);
>  		goto copy_data;
>  	} else if (is_better_name(name, taggerdate, distance, from_tag)) {
> +		free((char*) name->tip_name);
>  copy_data:
>  		name->tip_name = tip_name;

I would have expected a xstrdup() call here.

>  		name->taggerdate = taggerdate;
> @@ -106,21 +107,19 @@ static void name_rev(struct commit *start_commit,
>  {
>  	struct commit_list *list = NULL;
>  	const char *tip_name;
> -	char *to_free = NULL;
>
>  	parse_commit(start_commit);
>  	if (start_commit->date < cutoff)
>  		return;
>
>  	if (deref) {
> -		tip_name = to_free = xstrfmt("%s^0", start_tip_name);
> -		free((char*) start_tip_name);
> +		tip_name = xstrfmt("%s^0", start_tip_name);
>  	} else
> -		tip_name = start_tip_name;
> +		tip_name = strdup(start_tip_name);

This would not be needed with the central xstrdup() call mentioned above.

>
>  	if (!create_or_update_name(start_commit, tip_name, taggerdate, 0, 0,
>  				   from_tag)) {
> -		free(to_free);
> +		free((char*) tip_name);
>  		return;
>  	}
>
> @@ -139,7 +138,6 @@ static void name_rev(struct commit *start_commit,
>  			struct commit *parent = parents->item;
>  			const char *new_name;
>  			int generation, distance;
> -			const char *new_name_to_free = NULL;
>
>  			parse_commit(parent);
>  			if (parent->date < cutoff)
> @@ -159,11 +157,10 @@ static void name_rev(struct commit *start_commit,
>  					new_name = xstrfmt("%.*s^%d", (int)len,
>  							   name->tip_name,
>  							   parent_number);
> -				new_name_to_free = new_name;
>  				generation = 0;
>  				distance = name->distance + MERGE_TRAVERSAL_WEIGHT;
>  			} else {
> -				new_name = name->tip_name;
> +				new_name = strdup(name->tip_name);

... and neither would this.

Sure the xstrfmt() result would be duplicated instead of being reused, but
that doesn't increase memory usage overall.

>  				generation = name->generation + 1;
>  				distance = name->distance + 1;
>  			}
> @@ -174,7 +171,7 @@ static void name_rev(struct commit *start_commit,
>  				last_new_parent = commit_list_append(parent,
>  						  last_new_parent);
>  			else
> -				free((char*) new_name_to_free);
> +				free((char*) new_name);
>  		}
>
>  		*last_new_parent = list;
> @@ -313,7 +310,7 @@ static int name_ref(const char *path, const struct object_id *oid, int flags, vo
>  		if (taggerdate == TIME_MAX)
>  			taggerdate = commit->date;
>  		path = name_ref_abbrev(path, can_abbreviate_output);
> -		name_rev(commit, xstrdup(path), taggerdate, from_tag, deref);
> +		name_rev(commit, path, taggerdate, from_tag, deref);
>  	}
>  	return 0;
>  }
>