Re: [PATCH] Properly align memory allocations and temporary buffers

Elijah Newren <newren@xxxxxxxxx> · Fri, 7 Jan 2022 16:18:10 -0800

On Fri, Jan 7, 2022 at 3:30 PM René Scharfe <l.s.r@xxxxxx> wrote:
>
...
> --- >8 ---
> Subject: [PATCH] stable-qsort: avoid using potentially unaligned access
>
> Like in the previous patch for compat/qsort_s.c, remove the optimization
> of using an on-stack buffer to avoid small allocations.  This ensures
> maximum alignment for the array elements and simplifies the code a bit.
>
> The performance impact for the current callers is unlikely to be
> noticeable:
>
>  * compat/mingw.c::make_environment_block() uses ALLOC_ARRAY and
>    ALLOC_GROW several times already, so another allocation of up to 1KB
>    should not matter much.

Not familiar with this code, but that makes sense to me.

>  * diffcore-rename.c::diffcore_rename_extended() is called once per diff
>    or twice per merge, and those require allocations for each object and
>    more already.

Actually, this sort could be skipped entirely if it can determine all
renames early (either because they are found via exact matches, or
they aren't relevant for the merge result, or they are found via
basename comparisons).  This sort is only called when we have to
resort to the full quadratic pairwise comparisons between files, in
which case we're also slurping in full files, doing the diffcore-delta
work to convert each file's contents into a spanhash, and then
pairwise comparing spanhashes for every pairing of source &
destination files that remain.  That work would absolutely dwarf the
malloc of a kilobyte.  So I agree, it's not worth worrying about this
one.

>  * merge-ort.c::detect_and_process_renames() is called once per merge.
>    It's responsible for the two per-merge diffcore_rename_extended()
>    calls mentioned above as well, though.  So this is possibly the most
>    impacted caller.  Per-object allocations are likely to dwarf the
>    additional small allocations in git_stable_qsort(), though.

The particular sort call directly found in
detect_and_process_renames() was once nearly 30% of overall execution
time in merge-ort[1], but according to some local notes I kept, it
eventually dropped to about ~1% of overall execution time after
trivial directory resolution[2] (due to the fact that it could just
include some higher level directories and omit all the files
underneath them -- i.e. it had far fewer paths to sort).

Since I suspect your change here would generally just be a small
percentage of the overall git_stable_qsort() time (if it can even be
measured), and git_stable_qsort() time is a small percentage of
merge-ort runtime, I think any runtime differences here would be
negligible.

[1] https://lore.kernel.org/git/140c1e89e0ec69c5c5e8a99b632c1cf25c2325d4.1623168703.git.gitgitgadget@xxxxxxxxx/
[2] https://lore.kernel.org/git/pull.988.v4.git.1626841444.gitgitgadget@xxxxxxxxx/

>
> Signed-off-by: René Scharfe <l.s.r@xxxxxx>
> ---
>  stable-qsort.c | 16 +++++-----------
>  1 file changed, 5 insertions(+), 11 deletions(-)
>
> diff --git a/stable-qsort.c b/stable-qsort.c
> index 6cbaf39f7b..7ff12467cd 100644
> --- a/stable-qsort.c
> +++ b/stable-qsort.c
> @@ -48,15 +48,9 @@ void git_stable_qsort(void *b, size_t n, size_t s,
>                       int (*cmp)(const void *, const void *))
>  {
>         const size_t size = st_mult(n, s);
> -       char buf[1024];
> -
> -       if (size < sizeof(buf)) {
> -               /* The temporary array fits on the small on-stack buffer. */
> -               msort_with_tmp(b, n, s, cmp, buf);
> -       } else {
> -               /* It's somewhat large, so malloc it.  */
> -               char *tmp = xmalloc(size);
> -               msort_with_tmp(b, n, s, cmp, tmp);
> -               free(tmp);
> -       }
> +       char *tmp;
> +
> +       tmp = xmalloc(size);
> +       msort_with_tmp(b, n, s, cmp, tmp);
> +       free(tmp);
>  }
> --
> 2.34.1

Patch looks good to me.  Reviewed-by: Elijah Newren <newren@xxxxxxxxx>