On Thu, May 27, 2021 at 6:32 PM Taylor Blau <me@xxxxxxxxxxxx> wrote: > > On Thu, May 27, 2021 at 08:37:17AM +0000, Elijah Newren via GitGitGadget wrote: > > -static int string_list_df_name_compare(const char *one, const char *two) > > +static int sort_dirs_next_to_their_children(const void *a, const void *b) > > I looked at the new implementation of this function (and > df_name_compare() to double check it) and am convinced that it's > correctness, with the exception of one question. > > Some thoughts I had while trying to make sure this was all OK: > > > + const char *one = ((struct string_list_item *)a)->string; > > + const char *two = ((struct string_list_item *)b)->string; > > + unsigned char c1, c2; > > + > > + while (*one && (*one == *two)) { > > + one++; > > + two++; > > + } > > Advancing 'one' and 'two' to point at either the end of 'a' (and the > same position within 'b'), or the first place where the two have > different characters. If 'b' is shorter than 'a', '*one != *two' will > terminate the loop (since '*two' will be NUL, and '*one' will not). > > > + c1 = *one; > > + if (!c1) > > + c1 = '/'; > > + > > + c2 = *two; > > + if (!c2) > > + c2 = '/'; > > Store off the last character of each, or '/' if we got to the end. Hmm, > is this right (the guard in 'df_name_compare()' read 'if (!c1 && > S_ISDIR(mode1))'). Suppose both strings were "foo", then both c1 and c2 > would be "/", and I think we would return -1. > > That doesn't seem quite right to me. I think it *would* be right if we > checked the mode of each entry before assigning c1 or c2 to '/', > though. (Being generally unfamiliar with this area, I haven't looked to > see whether getting access to the modes of each entry at this point is > easy or not). Good reasoning; I should have been clearer about one of the assumptions that this function operates under which precludes that possibility. > > + > > + if (c1 == c2) { > > + /* Getting here means one is a leading directory of the other */ Your example case of both strings being "foo" obviously conflicts with this comment; but the comment is correct. This function will never be given two equal strings because it is called to sort the keys of a strmap, and strmap keys are unique by construction. (If one side of history has "foo" as a directory, and the other side has "foo" as a path, then there is still only one "foo" in opt->priv->paths. Every entry in opt->priv->paths records 3 hashes and modes and whatnot in order to know what each side of history had at the given path.) Also, even in that case (when two strings are equal), getting the right return value would only matter if we cared about a stable sort. But we call this function with QSORT, not STABLE_QSORT. Interestingly, this function technically doesn't even need to fully sort the array either. For example, if you took the output of 'git ls-tree -rt HEAD' and permuted the order of files within the same directory, that kind of level of quasi-sorted would be good enough for my purposes; I just need files underneath the same directory to be "together" and the containing directory to appear immediately before those. There is a later call to write_tree() that will hande sorting within a single directory to ensure fully-sorted-ness. Unfortunately, I don't know of a way to take advantage of that less strict sorting requirement for this point of the code to improve the performance further, so this function just implemented "pretend that every path has a '/' appended and then fully sort them". > > + return (*one) ? 1 : -1; > > + } > > > + else > > I did find this spacing awkward (would have expected '} else' or even '} > else {'), but it hardly matters. I'll fix that up while adding some comments about the purpose of the function and the fact that it assumes there will be no duplicates in the array being sorted.