> My main concerns: > - Can your version reach outside of a diff chunk? Currently no. It's optimised for reformatting and renaming, both of which preserve ordering. I could look into allowing disordered matches where the similarity is high, while still being biased towards ordered matches. If you can post more examples that would be helpful. > - Complexity and possibly performance. The recursive stuff made me > wonder about it a bit. It's no reason not to use it, just need to check > it more closely. Complexity I can't deny, I can only mitigate it with documentation/comments. I optimised the code pretty heavily and tested on some contrived worst-case scenarios and the performance was still good so I'm not worried about that. > Is the latest version of your stuff still the one you posted last week > or so? Yes. But reaching outside the chunk might lead to a significantly different API in the next version... > Similarly, do you think the "two pass" approach I have (check the chunk, > then check the parent file) would work with your recursive partitioning > style? That might make yours able to handle the "split diff chunk" case. Yes, should do. I'll see what I can come up with this week. > No algorithm will work for all cases. The one you just gave had the > simple heuristic working better than a complex one. We could make it > more complex, but then another example may be worse. I can live with > some inaccuracy in exchange for simplicity. Exactly, no algorithm will work for all cases. So what I'm suggesting is that it might be best to let the user choose which heuristic is appropriate for a given commit. If they know that the simple heuristic works best then perhaps we should let them choose that rather than only offering a one-size-fits-all option. But if we do want to go for one-size-fits-all then I'm very keen to make sure it at least solves the specific cases that we know about.