On 28 August 2017 at 10:06, Michael Haggerty <mhagger@xxxxxxxxxxxx> wrote: > On Sat, Aug 26, 2017 at 12:16 PM, Martin Ågren <martin.agren@xxxxxxxxx> wrote: >> On 25 August 2017 at 23:00, Junio C Hamano <gitster@xxxxxxxxx> wrote: >>> Martin Ågren <martin.agren@xxxxxxxxx> writes: >>> >>>> files_transaction_prepare() and the functions it calls add strings to a >>>> string list without duplicating them, i.e., we keep the original raw >>>> pointers we were given. That is "ok", since we keep them only for a >>>> short-enough time, but we end up leaking some of them. >>> >>> [...] > > Good find, Martin. > > First of all, you are right that we don't want any memory leaks here. > Nobody promises that the program will end soon if a reference update > fails. (In fact, there are invocations of `git` that trigger multiple > reference updates.) This is a small leak, but we should fix it. > > The problem (if I may take a stab at explaining it in my own words) is > that `files_transaction_prepare()` uses a `string_list` called > `affected_refnames` to ensure that the same reference name is not > modified more than once in a single reference transaction. Currently, > `affected_refnames` is configured *not* to duplicate the refnames that > are fed to it, which also means that it doesn't free the refnames when > it is cleared. > > This is correct for most refnames, which are owned by entries in the > `ref_transaction`, and therefore have a longer lifetime than > `affected_refnames`. > > But there is one code path that can add a refname to > `affected_refnames` without making a provision for the refname ever to > be freed: > > * files_transaction_prepare() > * lock_ref_for_update() > * split_symref_update() > * item = string_list_insert(affected_refnames, referent) > > The `referent` in the last statement comes from a `strbuf` that is > created in `lock_ref_for_update()` then passed to `lock_raw_ref()`, > which fills it. > > It would be a serious bug if `lock_ref_for_update()` would dispose of > `referent`, because the pointer in `affected_refnames` would point at > freed memory. But in fact we have the opposite problem; > `lock_ref_for_update()` never frees the memory (it doesn't even > `strbuf_detach()` it). So that memory is leaked. Thanks for this very well-written description. It matches my understanding completely, which is comforting. > Your proposed solution is to change `affected_refnames` to duplicate > the strings that are stored in it, in which case > `lock_ref_for_update()` can properly dispose of `referent`, fixing the > leak. This works, but at the price of having to allocate memory for > *all* references in the update, even though most of them are already > fine. Agreed. > But note that `split_symref_update()` *also* passes a copy of > `referent` to `ref_transaction_add_update()`, which *already* makes a > copy of the reference name and adds an entry containing the name to > the `ref_transaction`. If we would store *that* copy to > `affected_refnames`, then it would get freed when the > `ref_transaction` is cleaned up, and we could fix the memory leak > without allocating any new memory. This requires a little reorg of > `split_symref_update()` but it's not too bad: > > * Do the initial check using `string_list_has_string()` rather than > calling `string_list_insert()` already. > * After `new_update` has been created, call `string_list_insert()`, > passing it `new_update->refname` as the string. I haven't looked at the code yet, but from what I remember, this would a) work and b) be a straightforward rearrangement as you say. I'll give this approach a try (unless you want to, of course; just holler). > If this is done in place of your first commit, then your second commit > could be left unchanged. Thanks a lot for your comments. I appreciate it. Martin