Re: [PATCH] fast-import: fix incomplete conversion with multiple mark files

Tibor Billes <tbilles@xxxxxxx> · Mon, 8 Jun 2020 17:52:16 +0200 (CEST)

Hi,

On Sat, 6 Jun 2020, brian m. carlson wrote:

> When ddddf8d7e2 ("fast-import: permit reading multiple marks files",
> 2020-02-22) converted fast-import to handle multiple marks files in
> preparation for submodule support, the conversion was incomplete.  With
> a large number of marks, we would actually modify the marks variable
> even though we had passed in a different variable to operate on.  In
> addition, we didn't consider the fact that the code can replace the mark
> set passed in, so when we did so we happened to leak quite a bit of
> memory, since we never reused the structure we created, instead
> reallocating a new one each time.
>
> It doesn't appear from some testing that we actually produce incorrect
> results in this case, only that we leak a substantial amount of memory.
> To make things work properly and avoid leaking, pass a pointer to
> pointer to struct mark_set, which allows us to modify the set of marks
> when the number of marks is large.
>
> With this patch, importing a dump of git.git with a set of exported
> marks goes from taking in excess of 15 GiB of memory (and being killed
> by the Linux OOM killer) to using a maximum of 1.4 GiB of memory.
>
> Signed-off-by: Junio C Hamano <gitster@xxxxxxxxx>
> Signed-off-by: brian m. carlson <sandals@xxxxxxxxxxxxxxxxxxxx>

Thanks for the quickly patching it! I tested the patch and I can confirm this
solves the memory leak for me.

Thanks,
Tibor Billes