Am 07.01.22 um 22:30 schrieb Junio C Hamano: > René Scharfe <l.s.r@xxxxxx> writes: > >> If falling back to malloc(3) is fine in general then I wonder if we can >> just it use always. This would save two branches and make buffer >> management trivial here. How much worse is malloc(3) on platforms with >> alloca(3)? Do we sort lots of short lists somewhere? In other words: >> Does this stack allocation optimization actually make a measurable >> difference? > > Well all the preimage of this came from your 04ee8b87 (compat: add > qsort_s(), 2017-01-22), so you tell me ;-) Right, except I stole the code almost verbatim from compat/qsort.c, which had this optimization since 43fe901b71 (compat: Add simplified merge sort implementation from glibc, 2008-02-05). :) The optimization may have raised my eyebrow a bit at the time, but not enough to come up with a meaningful benchmark.. https://sourceware.org/git/?p=glibc.git;a=blob;f=stdlib/msort.c (the original) still uses alloca(3) (and more tricks), by the way; https://cgit.freebsd.org/src/tree/lib/libc/stdlib/merge.c still doesn't. > >> Heap fragmention should not be a concern here, at least, because the >> pattern of requesting, using and releasing a single allocation won't >> leave holes. > > Sure. Even in a multi-threaded environment that should be true. > > ----- >8 --------- >8 --------- >8 --------- >8 --------- >8 ----- > Subject: compat/qsort_s.c: avoid using potentially unaligned access > > The compatibility definition for qsort_s() uses "char buffer[1024]" > on the stack to avoid making malloc() calls for small temporary > space, which essentially hand-rolls alloca(). > > But the elements of the array being sorted may have alignment needs > more strict than what an array of bytes may have. &buf[0] may be > word aligned, but using the address as if it stores the first > element of an array of a struct, whose first member may need to be > aligned on double-word boundary, would be a no-no. > > We could use xalloca() from git-compat-util.h, or alloca() directly > on platforms with HAVE_ALLOCA_H, but let's try using unconditionally > xmalloc() before we know the performance characteristics of the > callers. > > It may not make much of an argument to inspect the current callers > and say "it shouldn't matter to any of them", but anyway: > > * The one in object-name.c is used to sort potential matches to a > given ambiguous object name prefix in the error path; > > * The one in pack-write.c is done once per a pack .idx file being > written to create the reverse index, so (1) the cost of malloc() > overhead is dwarfed by the cost of the packing operation, and (2) > the number of entries being sorted is the number of objects in a > pack; > > * The one in ref-filter.c is used by "branch --list", "tag --list", > and "for-each-ref", only once per operation. We sort an array of > pointers with entries, each corresponding to a ref that is shown. > > * The one in string-list.c is used by sort_string_list(), which is > way too generic to assume any access patterns, so it may or may > not matter, but I do not care too much ;-) > > Signed-off-by: Junio C Hamano <gitster@xxxxxxxxx> > --- > compat/qsort_s.c | 14 ++++---------- > 1 file changed, 4 insertions(+), 10 deletions(-) > > diff --git c/compat/qsort_s.c w/compat/qsort_s.c > index 52d1f0a73d..0f7ff30f5f 100644 > --- c/compat/qsort_s.c > +++ w/compat/qsort_s.c > @@ -49,21 +49,15 @@ int git_qsort_s(void *b, size_t n, size_t s, > int (*cmp)(const void *, const void *, void *), void *ctx) > { > const size_t size = st_mult(n, s); > - char buf[1024]; > + char *tmp; > > if (!n) > return 0; > if (!b || !cmp) > return -1; > > - if (size < sizeof(buf)) { > - /* The temporary array fits on the small on-stack buffer. */ > - msort_with_tmp(b, n, s, cmp, buf, ctx); > - } else { > - /* It's somewhat large, so malloc it. */ > - char *tmp = xmalloc(size); > - msort_with_tmp(b, n, s, cmp, tmp, ctx); > - free(tmp); > - } > + tmp = xmalloc(size); > + msort_with_tmp(b, n, s, cmp, tmp, ctx); > + free(tmp); > return 0; > } --- >8 --- Subject: [PATCH] stable-qsort: avoid using potentially unaligned access Like in the previous patch for compat/qsort_s.c, remove the optimization of using an on-stack buffer to avoid small allocations. This ensures maximum alignment for the array elements and simplifies the code a bit. The performance impact for the current callers is unlikely to be noticeable: * compat/mingw.c::make_environment_block() uses ALLOC_ARRAY and ALLOC_GROW several times already, so another allocation of up to 1KB should not matter much. * diffcore-rename.c::diffcore_rename_extended() is called once per diff or twice per merge, and those require allocations for each object and more already. * merge-ort.c::detect_and_process_renames() is called once per merge. It's responsible for the two per-merge diffcore_rename_extended() calls mentioned above as well, though. So this is possibly the most impacted caller. Per-object allocations are likely to dwarf the additional small allocations in git_stable_qsort(), though. Signed-off-by: René Scharfe <l.s.r@xxxxxx> --- stable-qsort.c | 16 +++++----------- 1 file changed, 5 insertions(+), 11 deletions(-) diff --git a/stable-qsort.c b/stable-qsort.c index 6cbaf39f7b..7ff12467cd 100644 --- a/stable-qsort.c +++ b/stable-qsort.c @@ -48,15 +48,9 @@ void git_stable_qsort(void *b, size_t n, size_t s, int (*cmp)(const void *, const void *)) { const size_t size = st_mult(n, s); - char buf[1024]; - - if (size < sizeof(buf)) { - /* The temporary array fits on the small on-stack buffer. */ - msort_with_tmp(b, n, s, cmp, buf); - } else { - /* It's somewhat large, so malloc it. */ - char *tmp = xmalloc(size); - msort_with_tmp(b, n, s, cmp, tmp); - free(tmp); - } + char *tmp; + + tmp = xmalloc(size); + msort_with_tmp(b, n, s, cmp, tmp); + free(tmp); } -- 2.34.1