Re: [PATCH 2/2] merge-ort: fix small memory leak in unique_path()

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Sat, Feb 19, 2022 at 2:31 PM Ævar Arnfjörð Bjarmason
<avarab@xxxxxxxxx> wrote:
>
> On Sat, Feb 19 2022, Elijah Newren via GitGitGadget wrote:
>
> > From: Elijah Newren <newren@xxxxxxxxx>
> >
> > The struct strmap paths member of merge_options_internal is perhaps the
> > most central data structure to all of merge-ort.  Because all the paths
> > involved in the merge need to be kept until the merge is complete, this
> > "paths" data structure traditionally took responsibility for owning all
> > the allocated paths.  When the merge is over, those paths were free()d
> > as part of free()ing this strmap.
> >
> > In commit 6697ee01b5d3 (merge-ort: switch our strmaps over to using
> > memory pools, 2021-07-30), we changed the allocations for pathnames to
> > come from a memory pool.  That meant the ownership changed slightly;
> > there were no individual free() calls to make, instead the memory pool
> > owned all those paths and they were free()d all at once.
> >
> > Unfortunately unique_path() was written presuming the pre-memory-pool
> > model, and allocated a path on the heap and left it in the strmap for
> > later free()ing.  Modify it to return a path allocated from the memory
> > pool instead.
>
> This seems like a rather obvious fix to the leak, as the other side
> wasn't ready to have the detached strbuf handed to it, and instead is
> assuming everything is mempools.
>
> The downside is a bit of heap churn here since you malloc() & use the
> strbuf just to ask for that size from the mempool, and then free() the
> strbuf (of course we had that before, we just weren't free-ing).
>
> So this is just an aside & I have no idea if it's worth it, but FWIW you
> can have your cake & eat it too here memory-allocation wise and avoid
> the strbuf allocation entirely, and just use your mem-pool.
>
> Like this:
>
> diff --git a/merge-ort.c b/merge-ort.c
> index 40ae4dc4e92..1111916d5cb 100644
> --- a/merge-ort.c
> +++ b/merge-ort.c
> @@ -731,6 +731,16 @@ static char *unique_path(struct merge_options *opt,
>         int suffix = 0;
>         size_t base_len;
>         struct strmap *existing_paths = &opt->priv->paths;
> +       /*
> +        * pre-size path + ~ + branch + _%d + "\0". Hopefully 6 digits
> +        * of suffix is enough for everyone?
> +        */
> +       const size_t max_suffix = 6;
> +       const size_t expected_len = strlen(path) + 1 + strlen(branch) + 1 +
> +               max_suffix + 1;
> +
> +       ret = mem_pool_alloc(&opt->priv->pool, expected_len);
> +       strbuf_attach(&newpath, ret, 0, expected_len);
>
>         strbuf_addf(&newpath, "%s~", path);
>         add_flattened_path(&newpath, branch);
> @@ -741,10 +751,10 @@ static char *unique_path(struct merge_options *opt,
>                 strbuf_addf(&newpath, "_%d", suffix++);
>         }
>
> -       /* Track the new path in our memory pool */
> -       ret = mem_pool_alloc(&opt->priv->pool, newpath.len + 1);
> -       memcpy(ret, newpath.buf, newpath.len + 1);
> -       strbuf_release(&newpath);
> +       if (newpath.alloc > expected_len)
> +               BUG("we assumed too much thinking '%s~%s' would fit in %lu, ended up %lu ('%s')",
> +                   path, branch, expected_len, newpath.alloc, newpath.buf);
> +
>         return ret;
>  }
>
>
> A bit nasty for sure, but if you're willing to BUG() out if we ever go
> above 999999 suffix tries or whatever (which would be trivial to add to
> the loop there) it's rather straightforward.
>
> I.e. we know the size of the buffer ahead of time, except for that loop
> that'll add "_%d" to the end, and that can be made bounded.
>
> Obviously your solution's a lot simpler, so I think this is only
> something you should consider if you think it matters for the
> performance numbers linked to from 6697ee01b5d3. I'm not familiar enough
> with merge-ort.c to know if it is in this case, or if this would be
> pointless micro-optimization on a non-hot codepath.

That's an interesting idea, but it's a micro-optimization on a very
cold path.  You need to either have a D/F conflict that persists or a
mode-type conflict (e.g. an add/add conflict with symlink vs. file
types).  Those tend to be quite rare; in fact, the testcase with the
performance numbers in 6697ee01b5d3 didn't have any of these and never
even triggered this codepath (otherwise I would have caught the leak
in my earlier leak testing).  So I think simple and robust makes more
sense here.




[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux