On Thu, May 27, 2021 at 2:00 PM René Scharfe <l.s.r@xxxxxx> wrote: > > Am 27.05.21 um 10:37 schrieb Elijah Newren via GitGitGadget: > > From: Elijah Newren <newren@xxxxxxxxx> > > > > Gathering accumulated times from trace2 output on the mega-renames > > testcase, I saw the following timings (where I'm only showing a few > > lines to highlight the portions of interest): > > > > 10.120 : label:incore_nonrecursive > > 4.462 : ..label:process_entries > > 3.143 : ....label:process_entries setup > > 2.988 : ......label:plist special sort > > 1.305 : ....label:processing > > 2.604 : ..label:collect_merge_info > > 2.018 : ..label:merge_start > > 1.018 : ..label:renames > > > > In the above output, note that the 4.462 seconds for process_entries was > > split as 3.143 seconds for "process_entries setup" and 1.305 seconds for > > "processing" (and a little time for other stuff removed from the > > highlight). Most of the "process_entries setup" time was spent on > > "plist special sort" which corresponds to the following code: > > > > trace2_region_enter("merge", "plist special sort", opt->repo); > > plist.cmp = string_list_df_name_compare; > > string_list_sort(&plist); > > trace2_region_leave("merge", "plist special sort", opt->repo); > > > > In other words, in a merge strategy that would be invoked by passing > > "-sort" to either rebase or merge, sorting an array takes more time than > > anything else. Serves me right for naming my merge strategy this way. > > > > Rewrite the comparison function and remove as many levels of indirection > > as possible (e.g. the old code had > > cmp_items() -> > > string_list_df_name_compare() -> > > df_name_compare() > > now we just have sort_dirs_next_to_their_children()), and tweak it to be > > as optimized as possible for our specific case. These changes reduced > > the time spent in "plist special sort" by ~25% in the mega-renames case. > > > > For the testcases mentioned in commit 557ac0350d ("merge-ort: begin > > performance work; instrument with trace2_region_* calls", 2020-10-28), > > this change improves the performance as follows: > > > > Before After > > no-renames: 5.622 s ± 0.059 s 5.235 s ± 0.042 s > > mega-renames: 10.127 s ± 0.073 s 9.419 s ± 0.107 s > > just-one-mega: 500.3 ms ± 3.8 ms 480.1 ms ± 3.9 ms > > Interesting. > > > > > Signed-off-by: Elijah Newren <newren@xxxxxxxxx> > > --- > > merge-ort.c | 64 ++++++++++++++++++++++++++++++++++------------------- > > 1 file changed, 41 insertions(+), 23 deletions(-) > > > > diff --git a/merge-ort.c b/merge-ort.c > > index 142d44d74d63..367aec4b7def 100644 > > --- a/merge-ort.c > > +++ b/merge-ort.c > > @@ -2746,31 +2746,50 @@ static int detect_and_process_renames(struct merge_options *opt, > > > > /*** Function Grouping: functions related to process_entries() ***/ > > > > -static int string_list_df_name_compare(const char *one, const char *two) > > +static int sort_dirs_next_to_their_children(const void *a, const void *b) > > { > > - int onelen = strlen(one); > > - int twolen = strlen(two); > > The old code scans both strings fully, while the new one stops when it > reaches a difference and doesn't look at any further characters. How > much does that contribute to the speedup? (I suspect a lot.) Oh, indeed, good catch. It appears to be responsible for essentially all of it. > > /* > > - * Here we only care that entries for D/F conflicts are > > - * adjacent, in particular with the file of the D/F conflict > > - * appearing before files below the corresponding directory. > > - * The order of the rest of the list is irrelevant for us. > > + * Here we only care that entries for directories appear adjacent > > + * to and before files underneath the directory. In other words, > > + * we do not want the natural sorting of > > + * foo > > + * foo.txt > > + * foo/bar > > + * Instead, we want "foo" to sort as though it were "foo/", so that > > + * we instead get > > + * foo.txt > > + * foo > > + * foo/bar > > + * To achieve this, we basically implement our own strcmp, except that > > + * if we get to the end of either string instead of comparing NUL to > > + * another character, we compare '/' to it. > > * > > - * To achieve this, we sort with df_name_compare and provide > > - * the mode S_IFDIR so that D/F conflicts will sort correctly. > > - * We use the mode S_IFDIR for everything else for simplicity, > > - * since in other cases any changes in their order due to > > - * sorting cause no problems for us. > > + * The reason to not use df_name_compare directly was that it was > > + * just too expensive, so I had to reimplement it. > > */ > > - int cmp = df_name_compare(one, onelen, S_IFDIR, > > - two, twolen, S_IFDIR); > > - /* > > - * Now that 'foo' and 'foo/bar' compare equal, we have to make sure > > - * that 'foo' comes before 'foo/bar'. > > - */ > > - if (cmp) > > - return cmp; > > - return onelen - twolen; > > + const char *one = ((struct string_list_item *)a)->string; > > + const char *two = ((struct string_list_item *)b)->string; > > Casting away const, hmm. :-/ Harmless because no actual write is > attempted, but still looks needlessly scary to me. Right, that should have been + const char *one = ((const struct string_list_item *)a)->string; + const char *two = ((const struct string_list_item *)b)->string; but since I was just assigning to a const char * on those lines, I'm not sure why it'd qualify as scary. Regardless, I'm happy to put these consts back in. > > + unsigned char c1, c2; > > + > > + while (*one && (*one == *two)) { > > + one++; > > + two++; > > + } > > + > > + c1 = *one; > > + if (!c1) > > + c1 = '/'; > > + > > + c2 = *two; > > + if (!c2) > > + c2 = '/'; > > + > > + if (c1 == c2) { > > + /* Getting here means one is a leading directory of the other */ > > + return (*one) ? 1 : -1; > > + } > > + else > > + return c1-c2; > > } > > > > static int read_oid_strbuf(struct merge_options *opt, > > @@ -3481,8 +3500,7 @@ static void process_entries(struct merge_options *opt, > > trace2_region_leave("merge", "plist copy", opt->repo); > > > > trace2_region_enter("merge", "plist special sort", opt->repo); > > - plist.cmp = string_list_df_name_compare; > > - string_list_sort(&plist); > > + QSORT(plist.items, plist.nr, sort_dirs_next_to_their_children); > > How much does the direct use of QSORT instead of string_list_sort() > contribute to the speedup? (I suspect only a bit.) Yep, I'll fix up the commit message.