On Fri, Jan 7, 2022 at 3:30 PM René Scharfe <l.s.r@xxxxxx> wrote: > ... > --- >8 --- > Subject: [PATCH] stable-qsort: avoid using potentially unaligned access > > Like in the previous patch for compat/qsort_s.c, remove the optimization > of using an on-stack buffer to avoid small allocations. This ensures > maximum alignment for the array elements and simplifies the code a bit. > > The performance impact for the current callers is unlikely to be > noticeable: > > * compat/mingw.c::make_environment_block() uses ALLOC_ARRAY and > ALLOC_GROW several times already, so another allocation of up to 1KB > should not matter much. Not familiar with this code, but that makes sense to me. > * diffcore-rename.c::diffcore_rename_extended() is called once per diff > or twice per merge, and those require allocations for each object and > more already. Actually, this sort could be skipped entirely if it can determine all renames early (either because they are found via exact matches, or they aren't relevant for the merge result, or they are found via basename comparisons). This sort is only called when we have to resort to the full quadratic pairwise comparisons between files, in which case we're also slurping in full files, doing the diffcore-delta work to convert each file's contents into a spanhash, and then pairwise comparing spanhashes for every pairing of source & destination files that remain. That work would absolutely dwarf the malloc of a kilobyte. So I agree, it's not worth worrying about this one. > * merge-ort.c::detect_and_process_renames() is called once per merge. > It's responsible for the two per-merge diffcore_rename_extended() > calls mentioned above as well, though. So this is possibly the most > impacted caller. Per-object allocations are likely to dwarf the > additional small allocations in git_stable_qsort(), though. The particular sort call directly found in detect_and_process_renames() was once nearly 30% of overall execution time in merge-ort[1], but according to some local notes I kept, it eventually dropped to about ~1% of overall execution time after trivial directory resolution[2] (due to the fact that it could just include some higher level directories and omit all the files underneath them -- i.e. it had far fewer paths to sort). Since I suspect your change here would generally just be a small percentage of the overall git_stable_qsort() time (if it can even be measured), and git_stable_qsort() time is a small percentage of merge-ort runtime, I think any runtime differences here would be negligible. [1] https://lore.kernel.org/git/140c1e89e0ec69c5c5e8a99b632c1cf25c2325d4.1623168703.git.gitgitgadget@xxxxxxxxx/ [2] https://lore.kernel.org/git/pull.988.v4.git.1626841444.gitgitgadget@xxxxxxxxx/ > > Signed-off-by: René Scharfe <l.s.r@xxxxxx> > --- > stable-qsort.c | 16 +++++----------- > 1 file changed, 5 insertions(+), 11 deletions(-) > > diff --git a/stable-qsort.c b/stable-qsort.c > index 6cbaf39f7b..7ff12467cd 100644 > --- a/stable-qsort.c > +++ b/stable-qsort.c > @@ -48,15 +48,9 @@ void git_stable_qsort(void *b, size_t n, size_t s, > int (*cmp)(const void *, const void *)) > { > const size_t size = st_mult(n, s); > - char buf[1024]; > - > - if (size < sizeof(buf)) { > - /* The temporary array fits on the small on-stack buffer. */ > - msort_with_tmp(b, n, s, cmp, buf); > - } else { > - /* It's somewhat large, so malloc it. */ > - char *tmp = xmalloc(size); > - msort_with_tmp(b, n, s, cmp, tmp); > - free(tmp); > - } > + char *tmp; > + > + tmp = xmalloc(size); > + msort_with_tmp(b, n, s, cmp, tmp); > + free(tmp); > } > -- > 2.34.1 Patch looks good to me. Reviewed-by: Elijah Newren <newren@xxxxxxxxx>