[Re-sending because this computer happened to have plain-text mode turned off for some weird reason, and thus the email bounced] Hi, On Mon, Mar 12, 2018 at 3:19 PM, Ævar Arnfjörð Bjarmason <avarab@xxxxxxxxx> wrote: > > Does this mean that e.g. in this case of merging two files, one > containing "foo" and one containing "bar": > > ( > rm -rf /tmp/test.git && > git init /tmp/test.git && > cd /tmp/test.git && > echo foo >README && > git add README && > git commit -mfoo && > git checkout --orphan trunk && > git reset --hard && > echo bar >README && > git add README && > git commit -mbar && > git merge --allow-unrelated-histories master; > cat README > ) > > That instead of getting: > > <<<<<<< HEAD > bar > ======= > foo > >>>>>>> master > > I'd now get these split into different files? As currently implemented, yes. However, I was more concerned the idea of handling files differently based on whether or not they were similar, rather than on what the precise definition of "similar" is for this context. As far as the definition of similarity goes, estimate_similarity() is currently used by rename detection to compare files recorded at different pathnames. By contrast, in this context, we are comparing two files which were recorded with the same pathname. That suggests the heuristic could be a bit different and use more than just estimate_similarity(). (e.g. "We consider these files similar IF more than 50% of the lines match OR both files are less than 2K.") > I don't mind this being a configurable option if you want it, but I > don't think it should be on by default, reasons: I don't think a configurable option makes sense, at least not for my purposes. Having rename/rename conflicts be "safely" mis-detected as rename/add or add/add, and having rename/add conflicts be "safely" mis-detected as add/add is my overriding concern. Thus, making these three conflict types behave consistently is what I need. Options would make that more difficult for me, and would thus feel like a step backwards. git am/rebase has been doing such mis-detections for years (almost since the "dawn" of git time), but it feels really broken to me because the conflict types aren't handled consistently. (The facts that (a) I'm the only one that has added rename/add testcases to git.git, (b) that I've added all but one of the rename/rename(2to1) testcases to the git.git testsuite, and (c) that rename/add has had multiple bugs for many years, all combine to suggest to me that folks just don't hit those conflict types in practice and thus that they just aren't noticing this breakage -- yet.) I also want to allow such mis-detections for cherry-picks and merges because of the significant (nearly order-of-magnitude in some cases) performance improvements I can get in rename detection if it's allowed. > 1) There's lots of cases where we totally screw up the "is this > similar?" check, in particular with small files. > > E.g. let's say you have a config file like 'fs-path "/tmp/git"' and > in two branches you change that to 'fs-path "/opt/git"' and 'fs-path > "/var/git"'. The rename detection will think this these have nothing > to do with each other since they share no common lines, but to a > human reader they're really similar, and would make sense in the > context of resolving a bigger merge where /{opt,var}/git changes are > conflicting. > > This is not some theoretical concern, there's lots of things that > e.g. use small 5-10 line config files to configure some app that > because of some combo of indentation changes and changing a couple > of lines will make git's rename detection totally give up, but to a > human reader they're 95% the same. Fair enough. The small files case could potentially be handled by just changing the similarity metric for these conflict types, as noted above. If it's a small file, that might be the easiest way for a user to deal with it too. I'm not sure I see the problem with the bigger files, though. If you have bigger files with less than 50% of the lines matching, then you'll essentially end up with a great big conflict block with one file on one side and the other file on the other side, which doesn't seem that different to me than having them be in two separate files. In fact, separate files seems easier to deal with because then the user can run e.g. 'git diff --no-index --color-words FILE1 FILE2', something that they can't do when it's in one file. That has bothered me more than once, and made me wish they were just in separate files. > 2) This will play havoc with already established merge tools on top of > git which a lot of users use instead of manually resolving these in > vi or whatever. > > If we made this the default they'd need to to deal with this new > state, and even if it's not the default we'll have some confused > users wondering why Emacs Ediff or whatever isn't showing the right > thing because it isn't supporting this yet. > > So actually, given that last point in #2 I'm slightly negative on the > whole thing, but maybe splitting it into some new format existing tools > don't understand is compelling enough to justify the downstream breakage. To me, this is a bigger concern. We have changed conflict resolutions in various ways at various times over the years, so I don't think the output should be considered fixed in stone, but I am very sympathetic to arguments that this particular change is too painful. There is a possible way to allow for a transition period, though. Junio has already requested that some of the changes I'm working on be done as a new merge strategy that is a rewrite of merge-recursive (https://public-inbox.org/git/xmqqk1ydkbx0.fsf@xxxxxxxxxxxxxxxxxxxxxxxxxxx/). It would be a little unfortunate to not be able to use the new merge strategy as an exact drop-in replacement of the current recursive merge strategy, but it would provide a way for people to get some time to migrate other tools. However, I'm far more concerned with the three collision conflict types having consistent behavior than I am with changing add/add conflict handling. And if your two concerns or Jonathan's concern would prevent you from wanting to use the new merge strategy (and possibly prevent it from becoming the default in the future), I'd much rather just modify rename/add and rename/rename to behave like add/add. Would that be more to your liking? Elijah