Re: [RFC PATCH 2/2] merge-recursive: optimize time complexity for get_unmerged

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Thu, Feb 13, 2025 at 10:30 AM Junio C Hamano <gitster@xxxxxxxxx> wrote:
>
> Elijah Newren <newren@xxxxxxxxx> writes:
>
> > (As a side note, due to the specialized structure of the input, I
> > suspect this code could be modified to run in O(n), i.e. we could skip
> > the string_list_lookup and the string_list_sort and the
> > string_list_remove_duplicates...
>
> Are you talking about the input being already sorted so we can just
> walk the multiple input and merge them into a single stream?  In the

I'm not sure what you mean by "merge them into a single stream".  I
think you have the right idea that we are creating a string list of
information about unmerged entries, and since we're taking information
from the index which is already sorted, we can just either modify the
last entry in the list if it matches or append a new entry to it; no
need to walk, insert, or binary search the list at all.

> cost analysis you did earlier in the message I am responding to,
> being able to go down to O(n) sounds really like a great thing ;-)

Note first that we aren't going from O(n^2) -> O(n), we're only going
from O(n log n) -> O(n).  That's still great, but:

  * n is typically pretty small (number of unmerged files)
  * there's things in merge-recursive that are O(m^2), where typically
m >> n (number of files in repo, or number of lines in big files in
the repo)
  * merge-recursive is used by almost no one
  * we are planning to delete merge-recursive

So, although O(n) is great....

> > But, it'd make the code trickier, so
> > it'd need to be carefully commented, the change would need to be
> > justified, and it'd need to be carefully tested.
>
> ... and measured.

+1

> > Even if we weren't
> > planning to delete this entire file, I suspect it's not possible to
> > find a case justifying such a change without optimizing several other
> > things in merge-recursive first, but optimizing those things probably
> > results in a significant rewrite...which we've already done with
> > merge-ort.)
>
> Sounds like unless the performance issues are shared between the
> two, it may not be worth to spend too much brain cycles only on the
> "recursive" one?

...yep, exactly, and this is not a performance issue shared with the
ort backend; it's unique to the recursive one.





[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux