On Mon, Apr 07, 2014 at 10:29:46AM -0700, Junio C Hamano wrote: > Kirill Smelkov <kirr@xxxxxxxxxxxxxx> writes: > > > The following > > ... > > maybe looks a bit simpler, but calls tree_entry_pathcmp twice more times. > > > > Besides for important nparent=1 case we were not calling > > tree_entry_pathcmp at all and here we'll call it once, which would slow > > execution down a bit, as base_name_compare shows measurable enough in profile. > > To avoid that we'll need to add 'if (i==imin) continue' and this won't > > be so simple then. And for general nparent case, as I've said, we'll be > > calling tree_entry_pathcmp twice more times... > > > > Because of all that I'd suggest to go with my original version. > > OK. Thanks. > > ... After some break on the topic, > > with a fresh eye I see a lot of confusion goes from the notation I've > > chosen initially (because of how I was reasoning about it on paper, when > > it was in flux) - i.e. xi for x[imin] and also using i as looping > > variable. And also because xi was already used for x[imin] I've used > > another letter 'k' denoting all other x'es, which leads to confusion... > > > > > > I propose we do the following renaming to clarify things: > > > > A/a -> T/t (to match resulting tree t name in the code) > > X/x -> P/p (to match parents trees tp in the code) > > i -> imin (so that i would be free for other tasks) > > > > then the above (with a prologue) would look like > > > > ---- 8< ---- > > * T P1 Pn > > * - - - > > * |t| |p1| |pn| > > * |-| |--| ... |--| imin = argmin(p1...pn) > > * | | | | | | > > * |-| |--| |--| > > * |.| |. | |. | > > * . . . > > * . . . > > * > > * at any time there could be 3 cases: > > * > > * 1) t < p[imin]; > > * 2) t > p[imin]; > > * 3) t = p[imin]. > > * > > * Schematic deduction of what every case means, and what to do, follows: > > * > > * 1) t < p[imin] -> ∀j t ∉ Pj -> "+t" ∈ D(T,Pj) -> D += "+t"; t↓ > > * > > * 2) t > p[imin] > > * > > * 2.1) ∃j: pj > p[imin] -> "-p[imin]" ∉ D(T,Pj) -> D += ø; ∀ pi=p[imin] pi↓ > > * 2.2) ∀i pi = p[imin] -> pi ∉ T -> "-pi" ∈ D(T,Pi) -> D += "-p[imin]"; ∀i pi↓ > > * > > * 3) t = p[imin] > > * > > * 3.1) ∃j: pj > p[imin] -> "+t" ∈ D(T,Pj) -> only pi=p[imin] remains to investigate > > * 3.2) pi = p[imin] -> investigate δ(t,pi) > > * | > > * | > > * v > > * > > * 3.1+3.2) looking at δ(t,pi) ∀i: pi=p[imin] - if all != ø -> > > * > > * ⎧δ(t,pi) - if pi=p[imin] > > * -> D += ⎨ > > * ⎩"+t" - if pi>p[imin] > > * > > * > > * in any case t↓ ∀ pi=p[imin] pi↓ > > ... > > now xk is gone and i matches p[i] (= pi) etc so variable names correlate > > to algorithm description better. > > > > Does that maybe clarify things? > > That sounds more consistent (modulo perhaps s/argmin/min/ at the > beginning?). Thanks. argmin is there on purpose - min(p1...pn) is the minimal p, and argmin(p1...pn) is imin such that p[imin] is minimal. As we are finding the index of the minimal element we should use argmin. > > P.S. Sorry for maybe some crept-in mistakes - I've tried to verify it > > thoroughly, but am too sleepy to be completely sure. On the other hand I > > think and hope the patch should be ok. > > Thanks and do not be sorry for "mistakes"---we have the review > process exactly for catching them. Thanks, I appreciate that. On Mon, Apr 07, 2014 at 11:07:47AM -0700, Junio C Hamano wrote: > Kirill Smelkov <kirr@xxxxxxxxxxxxxx> writes: > > >> > + if (!DIFF_OPT_TST(opt, FIND_COPIES_HARDER)) { > >> > + for (i = 0; i < nparent; ++i) > >> > + if (tp[i].entry.mode & S_IFXMIN_NEQ) > >> > + goto skip_emit_tp; > >> > + } > >> > + > >> > + p = emit_path(p, base, opt, nparent, > >> > + /*t=*/NULL, tp, imin); > >> > + > >> > + skip_emit_tp: > >> > + /* ∀ xk=ximin xk↓ */ > >> > + update_tp_entries(tp, nparent); > >> > >> There are parents whose path sort earlier than what is in 't' > >> (i.e. they were lost in the result---we would want to show > >> removal). What makes us jump to the skip label? > >> > >> We are looking at path in 't', and some parents have paths that > >> sort earlier than that path. We will not go to skip label if > >> any one of the parent's entry sorts after some other parent (or > >> the parent in question has ran out its entries), which means we > >> show the entry from the parents only when all the parents have > >> that same path, which is missing from 't'. > >> > >> I am not sure if I am reading this correctly, though. > >> > >> For the two-way diff, the above degenerates to "show all parent > >> entries that come before the first entry in 't'", which is correct. > >> For the combined diff, the current intersect_paths() makes sure that > >> each path appears in all the pair-wise diff between t and tp[], > >> which again means that the above logic match the current behaviour. > > > > Yes, correct (modulo we *will* go to skip label if any one of the > > parent's entry sorts after some other parent). By definition of combined > > diff we show a path only if it shows in every diff D(T,Pi), and if > > > > 2.1) ∃j: pj > p[imin] -> "-p[imin]" ∉ D(T,Pj) -> D += ø; ∀ pi=p[imin] pi↓ > > > > some pj sorts after p[imin] that would mean that Pj does not have > > p[imin] and since t > p[imin] (which means T does not have p[imin] > > either) diff D(T,Pj) does not have p[imin]. And because of that we know > > the whole combined-diff will not have p[imin] as, by definition, > > combined diff is sets intersection and one of the sets does not have > > that path. > > > > ( In usual words p[imin] is not changed between Pj..T - it was > > e.g. removed in Pj~, so merging parents to T does not bring any new > > information wrt path p[imin] and that is why we do not want to show > > p[imin] in combined-diff output - no new change about that path ) > > > > So nothing to append to the output, and update minimum tree entries, > > preparing for the next step. > > That's all in line with the current and traditional definition of > combined diff. Ok. > This is a tangent that is outside the scope of this current topic, > but I wonder if you found it disturbing that we treat the result 't' > that has a path and the result 't' that does not have a path with > respect to a parent that does not have the path in a somewhat > assymmetric way. > > With a merge M between commits A and B, where they all have the same > path with different contents, we obviously show that path in the > combined diff format. A merge N that records exactly the same tree > as M that merges the same commits A and B plus another commit C that > does not have that path still shows the combined diff, with one > extra column to express "everything in the result N has been added > with respect to C which did not have the path at all". > > However, a merge O between the same commits A and B, where A and B > have a path and O loses it, shows the path in the combined format. > A merge P among the same A, B and an extra parent C that does not > have that path ceases to show it (this is the assymmetry). Symmetry properties are very important and if something does not fit into symmetry - then something somewhere is really wrong for sure, but I think there is no asymmetry here. In your example M∆∙ N∆∙ / \ . ''' / .'\ ' ' A∆ B∙ Cø \ ''/. ' ' \ / ''' Oø Pø let's say a path can be: ø - empty ∆ - triangle ∙ - bullet ∆∙ - triangle+bullet then some symmetry operation is ∆∙ <-> ø ∙ <-> ∙ ∆ <-> ∆ so you "mirror" ∆∙ to ø (M,N->O,P), and so ø is mirrored back to ∆∙ (O,P->M,N). But then, when we mirror the whole graph, Cø should be mirrored to C'∆∙ and then that would be correct: A∆ B∙ C'∆∙ \ ''/. ' ' \ / ''' Oø Pø P to A,B,C' would show the path as N to A,B,C(without') In other words a merge P among the same A, B and extra parent C' with "contains everything" content is symmetrical to merge N to A,B and C with empty content. > It is a natural extension of "Do not show the path when the result > matches one of the parent" rule, and in this case the result P takes > contents, "the path does not exist", from one parent "C", so it is > internally consistent, and I originally designed it that way on > purpose, but somehow it feels a bit strange. I hope it does not fill strange once the true symmetry is discovered, and also P does not add a change compared to C, so the combined diff should be empty as no new information is present in a merge itself. A bit unusual on the first glance, but not strange, once you are used to it and consistent. Thanks, Kirill -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html