Re: [PATCH v2 1/2] packfile: fix race condition on unpack_entry()

Phil Hord <phil.hord@xxxxxxxxx> · Fri, 2 Oct 2020 13:06:46 -0700

On Mon, Sep 28, 2020 at 5:02 PM Matheus Tavares
<matheus.bernardino@xxxxxx> wrote:
>
> The third phase of unpack_entry() performs the following sequence in a
> loop, until all the deltas enumerated in phase one are applied and the
> entry is fully reconstructed:
>
> 1. Add the current base entry to the delta base cache
> 2. Unpack the next delta
> 3. Patch the unpacked delta on top of the base
>
> When the optional object reading lock is enabled, the above steps will
> be performed while holding the lock. However, step 2. momentarily
> releases it so that inflation can be performed in parallel for increased
> performance. Because the `base` buffer inserted in the cache at 1. is
> not duplicated, another thread can potentially free() it while the lock
> is released at 2. (e.g. when there is no space left in the cache to
> insert another entry). In this case, the later attempt to dereference
> `base` at 3. will cause a segmentation fault. This problem was observed
> during a multithreaded git-grep execution on a repository with large
> objects.
>
> To fix the race condition (and later segmentation fault), let's reorder
> the aforementioned steps so that `base` is only added to the cache at
> the end. This will prevent the buffer from being released by another
> thread while it is still in use. An alternative solution which would not
> require the reordering would be to duplicate `base` before inserting it
> in the cache. However, as Phil Hord mentioned, memcpy()'ing large bases
> can negatively affect performance: in his experiments, this alternative
> approach slowed git-grep down by 10% to 20%.
>
> Reported-by: Phil Hord <phil.hord@xxxxxxxxx>
> Signed-off-by: Matheus Tavares <matheus.bernardino@xxxxxx>
> ---

Thanks for looking after this so quickly.  This all looks good to me,
and I confirmed it does fix the problems I was seeing.

Phil