On Sat, Feb 19, 2022 at 2:31 PM Ævar Arnfjörð Bjarmason <avarab@xxxxxxxxx> wrote: > > On Sat, Feb 19 2022, Elijah Newren via GitGitGadget wrote: > > > From: Elijah Newren <newren@xxxxxxxxx> > > > > The struct strmap paths member of merge_options_internal is perhaps the > > most central data structure to all of merge-ort. Because all the paths > > involved in the merge need to be kept until the merge is complete, this > > "paths" data structure traditionally took responsibility for owning all > > the allocated paths. When the merge is over, those paths were free()d > > as part of free()ing this strmap. > > > > In commit 6697ee01b5d3 (merge-ort: switch our strmaps over to using > > memory pools, 2021-07-30), we changed the allocations for pathnames to > > come from a memory pool. That meant the ownership changed slightly; > > there were no individual free() calls to make, instead the memory pool > > owned all those paths and they were free()d all at once. > > > > Unfortunately unique_path() was written presuming the pre-memory-pool > > model, and allocated a path on the heap and left it in the strmap for > > later free()ing. Modify it to return a path allocated from the memory > > pool instead. > > This seems like a rather obvious fix to the leak, as the other side > wasn't ready to have the detached strbuf handed to it, and instead is > assuming everything is mempools. > > The downside is a bit of heap churn here since you malloc() & use the > strbuf just to ask for that size from the mempool, and then free() the > strbuf (of course we had that before, we just weren't free-ing). > > So this is just an aside & I have no idea if it's worth it, but FWIW you > can have your cake & eat it too here memory-allocation wise and avoid > the strbuf allocation entirely, and just use your mem-pool. > > Like this: > > diff --git a/merge-ort.c b/merge-ort.c > index 40ae4dc4e92..1111916d5cb 100644 > --- a/merge-ort.c > +++ b/merge-ort.c > @@ -731,6 +731,16 @@ static char *unique_path(struct merge_options *opt, > int suffix = 0; > size_t base_len; > struct strmap *existing_paths = &opt->priv->paths; > + /* > + * pre-size path + ~ + branch + _%d + "\0". Hopefully 6 digits > + * of suffix is enough for everyone? > + */ > + const size_t max_suffix = 6; > + const size_t expected_len = strlen(path) + 1 + strlen(branch) + 1 + > + max_suffix + 1; > + > + ret = mem_pool_alloc(&opt->priv->pool, expected_len); > + strbuf_attach(&newpath, ret, 0, expected_len); > > strbuf_addf(&newpath, "%s~", path); > add_flattened_path(&newpath, branch); > @@ -741,10 +751,10 @@ static char *unique_path(struct merge_options *opt, > strbuf_addf(&newpath, "_%d", suffix++); > } > > - /* Track the new path in our memory pool */ > - ret = mem_pool_alloc(&opt->priv->pool, newpath.len + 1); > - memcpy(ret, newpath.buf, newpath.len + 1); > - strbuf_release(&newpath); > + if (newpath.alloc > expected_len) > + BUG("we assumed too much thinking '%s~%s' would fit in %lu, ended up %lu ('%s')", > + path, branch, expected_len, newpath.alloc, newpath.buf); > + > return ret; > } > > > A bit nasty for sure, but if you're willing to BUG() out if we ever go > above 999999 suffix tries or whatever (which would be trivial to add to > the loop there) it's rather straightforward. > > I.e. we know the size of the buffer ahead of time, except for that loop > that'll add "_%d" to the end, and that can be made bounded. > > Obviously your solution's a lot simpler, so I think this is only > something you should consider if you think it matters for the > performance numbers linked to from 6697ee01b5d3. I'm not familiar enough > with merge-ort.c to know if it is in this case, or if this would be > pointless micro-optimization on a non-hot codepath. That's an interesting idea, but it's a micro-optimization on a very cold path. You need to either have a D/F conflict that persists or a mode-type conflict (e.g. an add/add conflict with symlink vs. file types). Those tend to be quite rare; in fact, the testcase with the performance numbers in 6697ee01b5d3 didn't have any of these and never even triggered this codepath (otherwise I would have caught the leak in my earlier leak testing). So I think simple and robust makes more sense here.