On Thu, Dec 30, 2021 at 02:04:30PM -0800, Elijah Newren wrote: > > enqueue(file_pair_for(extra_headers[j])) > > The queue is an array of sorted items, so enqueue here would be > insertion into an already sorted list. Inserting N items into a list > of M items is quadratic (O(N*M)) -- unless you meant to just append to > the end and add a third sort at the end? yeah I would have probably used a third sort > > > j++ > > At the end of the for loop, there may be remaining additional headers > that sort after all those found in the queue, so you'll need an > additional loop to handle those. my bad, I should have tried it > It's actually considerably more code as you can see from the diffstat, > and feels like we're reaching into some ugly internals with tmp_queue > (the SWAP and the special-case freeing) in order to get the desired > performance improvements. And it was already O(NlogN) overall (due to > the sort), which doesn't change with this new algorithm. It's really, > really hard for me to imagine a case where we have large numbers of > additional headers. Even if someone else can imagine that we for some > reason have a huge number of conflicts in order to generate a huge > number of additional headers...how could the performance of sorting > O(N) filenames and merging these lists possibly matter in comparison > to the O(N) three-way file merges that would likely have been > performed from those conflicts? Yeah, I agree with that conclusion, it's surely not worth the added complexity. Seeing the code definitely helps, thanks. > > So, I'm going to throw this code away and keep the original. > > It was an interesting idea and exercise; thanks for keeping me on my toes.