Am 22.09.19 um 21:05 schrieb SZEDER Gábor: > On Sat, Sep 21, 2019 at 02:37:18PM +0200, René Scharfe wrote: >> Am 21.09.19 um 11:57 schrieb SZEDER Gábor: >>> On Fri, Sep 20, 2019 at 08:14:07PM +0200, SZEDER Gábor wrote: >>>> On Fri, Sep 20, 2019 at 08:13:02PM +0200, SZEDER Gábor wrote: >>>>>> If the (re)introduced leak doesn't impact performance and memory >>>>>> usage too much then duplicating tip_name again in name_rev() or >>>>>> rather your new create_or_update_name() would likely make the >>>>>> lifetimes of those string buffers easier to manage. >>>>> >>>>> Yeah, the easiest would be when each 'struct rev_name' in the commit >>>>> slab would have its own 'tip_name' string, but that would result in >>>>> a lot of duplicated strings and increased memory usage. >>>> >>>> I didn't measure how much more memory would be used, though. >>> >>> So, I tried the patch below to give each 'struct rev_name' instance >>> its own copy of 'tip_name', and the memory usage of 'git name-rev >>> --all' usually increased. >>> >>> The increase depends on how many merges and how many refs there are >>> compared to the number of commits: the fewer merges and refs, the >>> higher the more the memory usage increased: >>> >>> linux: +4.8% >>> gcc: +7.2% >>> gecko-dev: +9.2% >>> webkit: +12.4% >>> llvm-project: +19.0% >> >> Is that the overall memory usage or just for struct rev_name instances >> and tip_name strings? > > It's overall memory usage, the avarage of five runs of: > > /usr/bin/time --format='%M' ~/src/git/git name-rev --all > >> And how much is that in absolute terms? > > git: 29801 -> 28514 > linux: 317018 -> 332218 > gcc: 106462 -> 114140 > gecko: 315448 -> 344486 > webkit: 55847 -> 62780 > llvm: 112867 -> 134384 I only have the first two handy, and I get numbers like this with master: git, lots of branches with long names: 3075476 git, local clone, single branch: 1349016 linux, single branch: 1520468 O_o >> (Perhaps >> it's worth it to get the memory ownership question off the table at >> least during the transformation to iterative processing.) > > I looked into it only because I got curious, but other than that I > will definitely play the "beyond the scope of this patch series" card > :) Fair enough. >> I wonder why regular commits even need a struct name_rev. Shouldn't >> only tips and roots need ones? And perhaps merges and occasional >> regular "checkpoint" commits, to avoid too many duplicate traversals. > > The 'struct rev_name' holds all info that's necessary to determine the > commit's name. It seems to be much simpler to just attach one to each > commit and then retrieve it from the commit slab when printing the > name of the commit than to come up with an algorithm where only a > sleect set of commits get a 'struct rev_name', including how to access > those when naming a commit that doesn't have one. Sure, the lookup of individual commits is much easier once all commits have name tags attached. Preparing that sounds expensive, though. It's a trade-off favoring looking up lots of names per program run. >>> --- >8 --- >>> >>> diff --git a/builtin/name-rev.c b/builtin/name-rev.c >>> index 6969af76c4..62ab78242b 100644 >>> --- a/builtin/name-rev.c >>> +++ b/builtin/name-rev.c >>> @@ -88,6 +88,7 @@ static struct rev_name *create_or_update_name(struct commit *commit, >>> set_commit_rev_name(commit, name); >>> goto copy_data; >>> } else if (is_better_name(name, taggerdate, distance, from_tag)) { >>> + free((char*) name->tip_name); >>> copy_data: >>> name->tip_name = tip_name; >> >> I would have expected a xstrdup() call here. > > But then we'd needed to release the results of all those xstrfmt() > calls at the callsites of create_or_update_name(), so instead of those > strdup() calls that you deem unnecessary we would need additional > free() calls. Correct. That would be simpler and shouldn't affect peak memory usage. René