On Wed, Sep 12, 2018 at 11:18 AM Junio C Hamano <gitster@xxxxxxxxx> wrote: > > Stefan Beller <sbeller@xxxxxxxxxx> writes: > > > We can string_list_insert() to maintain sorted-ness of the > > list as we find new items, or we can string_list_append() to > > build an unsorted list and sort it at the end just once. > > > > To pick which one is more appropriate, we notice the fact > > that we discover new items more or less in the already > > sorted order. That makes "append then sort" more > > appropriate. > > Sorry, but I still do not get the math you are implying in the > second paragraph. Are you saying that append-then-sort is efficient > when items being appended is already sorted? That depends on the > sorting algorithm used, so the logic is incomplete unless you say > "given that we use X for sorting,...", I think. > > Do we really discover new items in sorted order, by the way? In a > single diff invocation made inside collect_changed_submodules() for > one commit in the superproject's history, we will grab changed paths > in the pathname order (i.e. sorted); if the superproject's tip commit > touches the submodules at paths A and Z, we will discover these two > paths in sorted order. > > But because we are walking the superproject's history to collect all > paths that have been affected in that function, and repeatedly > calling diff as we discover commit in the superproject's history, I > am not sure how well the resulting set of paths would be sorted. > > The tip commit in superproject's history may have modified the > submodule at path X, the parent of that commit may have touched the > submodule at path M, and its parent may have touched the submodule > at path A. Don't we end up grabbing these paths in that discoverd > order, i.e. X, M and A? That is true. > > I still think changing it from "insert as we find an item, keeping > the list sorted" to "append all and then sort before we start > looking things up from the result" makes sense, but I do not think > the "we find things in sorted order" is either true, or it would > affect the choice between the two. A justification to choose the > latter I can think of that makes sense is that we don't have to pay > cost to keep the list sorted while building it because we do not do > any look-up while building the list. ok. Thanks, Stefan