On Tue, Nov 14, 2017 at 1:11 PM, Elijah Newren <newren@xxxxxxxxx> wrote: > On Mon, Nov 13, 2017 at 4:42 PM, Stefan Beller <sbeller@xxxxxxxxxx> wrote: >> On Fri, Nov 10, 2017 at 11:05 AM, Elijah Newren <newren@xxxxxxxxx> wrote: > >> "In my opinion" ... sounds like commit message? > > Sure, I can move it there. > > >>> +# Testcase 7a, rename-dir vs. rename-dir (NOT split evenly) PLUS add-other-file >>> +# Commit A: z/{b,c} >>> +# Commit B: y/{b,c} >>> +# Commit C: w/b, x/c, z/d >>> +# Expected: y/d, CONFLICT(rename/rename for both z/b and z/c) >>> +# NOTE: There's a rename of z/ here, y/ has more renames, so z/d -> y/d. >> >> But the creator of C intended to have z/d, not {w,x}/d, and as {w,x} == y, >> I am not sure I like this result. (I have no concrete counter example, just >> messy logic) > > I'm open to alternative interpretations here. The biggest issue for > me -- going back our discussion at the end of > https://public-inbox.org/git/CABPp-BFKiam6AK-Gg_RzaLuLur-jz0kvv3TqsHNHg5+HTv_uzA@xxxxxxxxxxxxxx/ > -- is "simple, predictable rule", which is consistent with the other > rules and limits the number of nasty corner cases as much as possible. > Perhaps you think this is one of those nasty corner cases, and that's > fair, but I think it'd be hard to do much better. yup, I agree that a simple, predictable rule is better than optimizing for the corner cases we can come up now. (I sent this email before reading your reply, so thanks for re-iterating the answer.) > After spending quite a while trying to think of any other alternative > rules or ways of looking at this, I could only come up with two > points: > > 1) One could view this as a case where commit C didn't in fact do > any directory rename -- note that directory z/ still exists in that > commit. Thus, only B did a rename, it renamed z/ -> y/, thus C's z/d > should be moved to y/d. So, this choice is consistent with the other > rules we've got. I wonder if we can do a data driven approach, i.e. mine some history (linux, git, other large projects), that would tell us which of these cases happens very often, and which of these corner cases can be "safely ignored because it never happens". My gut feeling tells me that splitting up a directory into two or three (potential sub-)directories is a common thing, whereas double renames are not as often. (But that's just my view, I have no data to back it up; the selection of the data would open the next debate as it will be very specific to a given community. But as linux has had such a huge impact on git, I'd be tempted to claim any study on linux.git is fruitful for gits defaults) > 2) An alternate (or maybe additional?) rule: We could decide that if > a source path is renamed on both sides of history, then we'll just > ignore both renames for consideration of directory rename detection. > > The new rule idea would "fix" this testcase to your liking, although > now we'd be somewhat inconsistent with the "directory still exists > implies no directory rename occurred rule". But what other weirdness > could entail? Here's a few I've thought of: > > Commit O: z/{b,c,d} > Commit A: y/{b,c} > Commit B: z/{newb, newc, e} > > Here, A renamed z/ -> y/. .. while deleting d ... > Except B renamed z/b and z/c differently, ... and z/d -> z/e, maybe(?). > so all paths used to detect the z/ -> y/ rename are ignored, so there > isn't a rename after all. I'm not so sure I like that decision. > Let's keep looking though, and change it up a bit more: > > Commit O: z/{b,c,d} > Commit A: y/{b,c}, x/d > Commit B: z/{newb, newc, d, e} > > Here, A has a split rename. Since B renames z/b and z/c differently, > we have to ignore the z/ -> y/ rename, and thus the only rename left > implies z/ -> x/. Thus we'd end up with z/e getting moved into x/e. > Seems weird to me, and less likely that a user would understand this > rule than the "majority wins" one. It still is "majority wins" except the set of inspected files is filtered first. In an *ideal, but expensive* algorithm, we might give different weights to files, e.g. * large files have more weight than smaller files, * files with interesting names have more weight (c.f. Makefile vs. xstrbuf.c) * similar files have more weight than files that are rewritten, or rather the more rewrite is done the less impact one file has. * how unique file content is (LICENSE.txt that exists 23 times in the tree has less weight than the sekret-algorithm.c) and depending on these weights we have a "majority of content" moved to y/ or x/. >>> +# Testcase 7c, rename/rename(1to...2or3); transitive rename may add complexity >>> +# (Related to testcases 3b and 5c) >>> +# Commit A: z/{b,c}, x/d >>> +# Commit B: y/{b,c}, w/d >>> +# Commit C: z/{b,c,d} >>> +# Expected: y/{b,c}, CONFLICT(x/d -> w/d vs. y/d) >> >> CONFLICT(x/d -> y/d vs w/d) ? > > I'm afraid I'm not following the question. Yesterday I had the impression the renaming perspective changed, note the difference in order of y/ and w/ inside the CONFLICT. I might have been confused already, though. >>> +# Testcase 7e, transitive rename in rename/delete AND dirs in the way >>> +# (Very similar to 'both rename source and destination involved in D/F conflict' from t6022-merge-rename.sh) >>> +# (Also related to testcases 9c and 9d) >>> +# Commit A: z/{b,c}, x/d_1 >>> +# Commit B: y/{b,c,d/g}, x/d/f >>> +# Commit C: z/{b,c,d_1} >>> +# Expected: rename/delete(x/d_1->y/d_1 vs. None) + D/F conflict on y/d >>> +# y/{b,c,d/g}, y/d_1~C^0, x/d/f >>> +# NOTE: x/d/f may be slightly confusing here. x/d_1 -> z/d_1 implies >>> +# there is a directory rename from x/ -> z/, performed by commit C. >>> +# However, on the side of commit B, it renamed z/ -> y/, thus >>> +# making a rename from x/ -> z/ when it was getting rid of z/ seems >>> +# non-sensical. Further, putting x/d/f into y/d/f also doesn't >>> +# make a lot of sense because commit B did the renaming of z to y >>> +# and it created x/d/f, and it clearly made these things separate, >>> +# so it doesn't make much sense to push these together. >> >> This is confusing. > > Indeed it is. When I first wrote this testcase, I didn't realize that > I actually had two potentially directory renames involved and a > doubly-transitive rename from it, on top of the D/F conflict. I can > see two ways to resolve this. > > 1) Leave the testcase alone, just try to make the NOTE more clear: > > NOTE: The main path of interest here is d_1 and where it ends up, but > this is actually a case that has two potential directory renames > involved and D/F conflict(s), so it makes sense to walk through each > step. Commit B renames z/ -> y/. Thus everything that C adds to z/ > should be instead moved to y/. This gives us the D/F conflict on y/d > because x/d_1 -> z/d_1 -> y/d_1 conflicts with y/d/g. Further, commit > C renames x/ -> z/, thus everything B adds to x/ should instead be > moved to z/...BUT we removed z/ and renamed it to y/, so maybe > everything should move not from x/ to z/, but from x/ to z/ to y/. > Doing so might make sense from the logic so far, but note that commit > B had both an x/ and a y/; it did the renaming of z/ to y/ and created > x/d/f and it clearly made these things separate, so it doesn't make > much sense to push these together. Doing so is what I'd call a doubly > transitive rename; see testcases 9c and 9d for further discussion of > this issue and how it's resolved. > > 2) Modify the testcase so it doesn't have two potential directory > renames involved. Just add another unrelated file under x/ that > doesn't change on either side, thus removing the x/ -> z/ rename from > the mix. That wouldn't actually change the expected result (other > than the new file should remain around), but it would change the > reasoning and simplify it: > > NOTE: Commit B renames z/ -> y/. Thus everything that C adds to z/ > should be instead moved to y/. This gives us the D/F conflict on y/d > because x/d_1 -> z/d_1 -> y/d_1 conflicts with y/d/g. As a side note, > one could imagine an alternative implementation trying to resolve D/F > conflicts caused by renames by just undoing the rename, but in this > case that would end up with us needing to write an x/d_1, which would > still be a D/F conflict with x/d/f. What do you want to test in 7e? AFAICT section 7 is about "More involved Edge/Corner cases", so keeping it edge sounds fine. (hence I'd vote for (1), just adjusting the notes)