Re: [PATCH 08/15] name-rev: pull out deref handling from the recursion

René Scharfe <l.s.r@xxxxxx> · Mon, 23 Sep 2019 20:43:11 +0200

Am 22.09.19 um 21:05 schrieb SZEDER Gábor:
> On Sat, Sep 21, 2019 at 02:37:18PM +0200, René Scharfe wrote:
>> Am 21.09.19 um 11:57 schrieb SZEDER Gábor:
>>> On Fri, Sep 20, 2019 at 08:14:07PM +0200, SZEDER Gábor wrote:
>>>> On Fri, Sep 20, 2019 at 08:13:02PM +0200, SZEDER Gábor wrote:
>>>>>> If the (re)introduced leak doesn't impact performance and memory
>>>>>> usage too much then duplicating tip_name again in name_rev() or
>>>>>> rather your new create_or_update_name() would likely make the
>>>>>> lifetimes of those string buffers easier to manage.
>>>>>
>>>>> Yeah, the easiest would be when each 'struct rev_name' in the commit
>>>>> slab would have its own 'tip_name' string, but that would result in
>>>>> a lot of duplicated strings and increased memory usage.
>>>>
>>>> I didn't measure how much more memory would be used, though.
>>>
>>> So, I tried the patch below to give each 'struct rev_name' instance
>>> its own copy of 'tip_name', and the memory usage of 'git name-rev
>>> --all' usually increased.
>>>
>>> The increase depends on how many merges and how many refs there are
>>> compared to the number of commits: the fewer merges and refs, the
>>> higher the more the memory usage increased:
>>>
>>>   linux:         +4.8%
>>>   gcc:           +7.2%
>>>   gecko-dev:     +9.2%
>>>   webkit:       +12.4%
>>>   llvm-project: +19.0%
>>
>> Is that the overall memory usage or just for struct rev_name instances
>> and tip_name strings?
>
> It's overall memory usage, the avarage of five runs of:
>
>   /usr/bin/time --format='%M' ~/src/git/git name-rev --all
>
>> And how much is that in absolute terms?
>
> git:     29801 ->  28514
> linux:  317018 -> 332218
> gcc:    106462 -> 114140
> gecko:  315448 -> 344486
> webkit:  55847 ->  62780
> llvm:   112867 -> 134384

I only have the first two handy, and I get numbers like this with
master:

git, lots of branches with long names: 3075476
git, local clone, single branch:       1349016
linux, single branch:                  1520468

O_o

>> (Perhaps
>> it's worth it to get the memory ownership question off the table at
>> least during the transformation to iterative processing.)
>
> I looked into it only because I got curious, but other than that I
> will definitely play the "beyond the scope of this patch series" card
> :)

Fair enough.

>> I wonder why regular commits even need a struct name_rev.  Shouldn't
>> only tips and roots need ones?  And perhaps merges and occasional
>> regular "checkpoint" commits, to avoid too many duplicate traversals.
>
> The 'struct rev_name' holds all info that's necessary to determine the
> commit's name.  It seems to be much simpler to just attach one to each
> commit and then retrieve it from the commit slab when printing the
> name of the commit than to come up with an algorithm where only a
> sleect set of commits get a 'struct rev_name', including how to access
> those when naming a commit that doesn't have one.

Sure, the lookup of individual commits is much easier once all commits
have name tags attached.  Preparing that sounds expensive, though.
It's a trade-off favoring looking up lots of names per program run.

>>>  --- >8 ---
>>>
>>> diff --git a/builtin/name-rev.c b/builtin/name-rev.c
>>> index 6969af76c4..62ab78242b 100644
>>> --- a/builtin/name-rev.c
>>> +++ b/builtin/name-rev.c
>>> @@ -88,6 +88,7 @@ static struct rev_name *create_or_update_name(struct commit *commit,
>>>  		set_commit_rev_name(commit, name);
>>>  		goto copy_data;
>>>  	} else if (is_better_name(name, taggerdate, distance, from_tag)) {
>>> +		free((char*) name->tip_name);
>>>  copy_data:
>>>  		name->tip_name = tip_name;
>>
>> I would have expected a xstrdup() call here.
>
> But then we'd needed to release the results of all those xstrfmt()
> calls at the callsites of create_or_update_name(), so instead of those
> strdup() calls that you deem unnecessary we would need additional
> free() calls.

Correct.  That would be simpler and shouldn't affect peak memory
usage.

René