On Thu, Feb 13, 2025 at 10:30 AM Junio C Hamano <gitster@xxxxxxxxx> wrote: > > Elijah Newren <newren@xxxxxxxxx> writes: > > > (As a side note, due to the specialized structure of the input, I > > suspect this code could be modified to run in O(n), i.e. we could skip > > the string_list_lookup and the string_list_sort and the > > string_list_remove_duplicates... > > Are you talking about the input being already sorted so we can just > walk the multiple input and merge them into a single stream? In the I'm not sure what you mean by "merge them into a single stream". I think you have the right idea that we are creating a string list of information about unmerged entries, and since we're taking information from the index which is already sorted, we can just either modify the last entry in the list if it matches or append a new entry to it; no need to walk, insert, or binary search the list at all. > cost analysis you did earlier in the message I am responding to, > being able to go down to O(n) sounds really like a great thing ;-) Note first that we aren't going from O(n^2) -> O(n), we're only going from O(n log n) -> O(n). That's still great, but: * n is typically pretty small (number of unmerged files) * there's things in merge-recursive that are O(m^2), where typically m >> n (number of files in repo, or number of lines in big files in the repo) * merge-recursive is used by almost no one * we are planning to delete merge-recursive So, although O(n) is great.... > > But, it'd make the code trickier, so > > it'd need to be carefully commented, the change would need to be > > justified, and it'd need to be carefully tested. > > ... and measured. +1 > > Even if we weren't > > planning to delete this entire file, I suspect it's not possible to > > find a case justifying such a change without optimizing several other > > things in merge-recursive first, but optimizing those things probably > > results in a significant rewrite...which we've already done with > > merge-ort.) > > Sounds like unless the performance issues are shared between the > two, it may not be worth to spend too much brain cycles only on the > "recursive" one? ...yep, exactly, and this is not a performance issue shared with the ort backend; it's unique to the recursive one.