Am 27.05.21 um 10:37 schrieb Elijah Newren via GitGitGadget: > From: Elijah Newren <newren@xxxxxxxxx> > > Gathering accumulated times from trace2 output on the mega-renames > testcase, I saw the following timings (where I'm only showing a few > lines to highlight the portions of interest): > > 10.120 : label:incore_nonrecursive > 4.462 : ..label:process_entries > 3.143 : ....label:process_entries setup > 2.988 : ......label:plist special sort > 1.305 : ....label:processing > 2.604 : ..label:collect_merge_info > 2.018 : ..label:merge_start > 1.018 : ..label:renames > > In the above output, note that the 4.462 seconds for process_entries was > split as 3.143 seconds for "process_entries setup" and 1.305 seconds for > "processing" (and a little time for other stuff removed from the > highlight). Most of the "process_entries setup" time was spent on > "plist special sort" which corresponds to the following code: > > trace2_region_enter("merge", "plist special sort", opt->repo); > plist.cmp = string_list_df_name_compare; > string_list_sort(&plist); > trace2_region_leave("merge", "plist special sort", opt->repo); > > In other words, in a merge strategy that would be invoked by passing > "-sort" to either rebase or merge, sorting an array takes more time than > anything else. Serves me right for naming my merge strategy this way. > > Rewrite the comparison function and remove as many levels of indirection > as possible (e.g. the old code had > cmp_items() -> > string_list_df_name_compare() -> > df_name_compare() > now we just have sort_dirs_next_to_their_children()), and tweak it to be > as optimized as possible for our specific case. These changes reduced > the time spent in "plist special sort" by ~25% in the mega-renames case. > > For the testcases mentioned in commit 557ac0350d ("merge-ort: begin > performance work; instrument with trace2_region_* calls", 2020-10-28), > this change improves the performance as follows: > > Before After > no-renames: 5.622 s ± 0.059 s 5.235 s ± 0.042 s > mega-renames: 10.127 s ± 0.073 s 9.419 s ± 0.107 s > just-one-mega: 500.3 ms ± 3.8 ms 480.1 ms ± 3.9 ms Interesting. > > Signed-off-by: Elijah Newren <newren@xxxxxxxxx> > --- > merge-ort.c | 64 ++++++++++++++++++++++++++++++++++------------------- > 1 file changed, 41 insertions(+), 23 deletions(-) > > diff --git a/merge-ort.c b/merge-ort.c > index 142d44d74d63..367aec4b7def 100644 > --- a/merge-ort.c > +++ b/merge-ort.c > @@ -2746,31 +2746,50 @@ static int detect_and_process_renames(struct merge_options *opt, > > /*** Function Grouping: functions related to process_entries() ***/ > > -static int string_list_df_name_compare(const char *one, const char *two) > +static int sort_dirs_next_to_their_children(const void *a, const void *b) > { > - int onelen = strlen(one); > - int twolen = strlen(two); The old code scans both strings fully, while the new one stops when it reaches a difference and doesn't look at any further characters. How much does that contribute to the speedup? (I suspect a lot.) > /* > - * Here we only care that entries for D/F conflicts are > - * adjacent, in particular with the file of the D/F conflict > - * appearing before files below the corresponding directory. > - * The order of the rest of the list is irrelevant for us. > + * Here we only care that entries for directories appear adjacent > + * to and before files underneath the directory. In other words, > + * we do not want the natural sorting of > + * foo > + * foo.txt > + * foo/bar > + * Instead, we want "foo" to sort as though it were "foo/", so that > + * we instead get > + * foo.txt > + * foo > + * foo/bar > + * To achieve this, we basically implement our own strcmp, except that > + * if we get to the end of either string instead of comparing NUL to > + * another character, we compare '/' to it. > * > - * To achieve this, we sort with df_name_compare and provide > - * the mode S_IFDIR so that D/F conflicts will sort correctly. > - * We use the mode S_IFDIR for everything else for simplicity, > - * since in other cases any changes in their order due to > - * sorting cause no problems for us. > + * The reason to not use df_name_compare directly was that it was > + * just too expensive, so I had to reimplement it. > */ > - int cmp = df_name_compare(one, onelen, S_IFDIR, > - two, twolen, S_IFDIR); > - /* > - * Now that 'foo' and 'foo/bar' compare equal, we have to make sure > - * that 'foo' comes before 'foo/bar'. > - */ > - if (cmp) > - return cmp; > - return onelen - twolen; > + const char *one = ((struct string_list_item *)a)->string; > + const char *two = ((struct string_list_item *)b)->string; Casting away const, hmm. :-/ Harmless because no actual write is attempted, but still looks needlessly scary to me. > + unsigned char c1, c2; > + > + while (*one && (*one == *two)) { > + one++; > + two++; > + } > + > + c1 = *one; > + if (!c1) > + c1 = '/'; > + > + c2 = *two; > + if (!c2) > + c2 = '/'; > + > + if (c1 == c2) { > + /* Getting here means one is a leading directory of the other */ > + return (*one) ? 1 : -1; > + } > + else > + return c1-c2; > } > > static int read_oid_strbuf(struct merge_options *opt, > @@ -3481,8 +3500,7 @@ static void process_entries(struct merge_options *opt, > trace2_region_leave("merge", "plist copy", opt->repo); > > trace2_region_enter("merge", "plist special sort", opt->repo); > - plist.cmp = string_list_df_name_compare; > - string_list_sort(&plist); > + QSORT(plist.items, plist.nr, sort_dirs_next_to_their_children); How much does the direct use of QSORT instead of string_list_sort() contribute to the speedup? (I suspect only a bit.) > trace2_region_leave("merge", "plist special sort", opt->repo); > > trace2_region_leave("merge", "process_entries setup", opt->repo); >