Re: [PATCH] git-apply: fix --3way with binary patch

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Wed, Jul 28, 2021 at 12:38 PM Jerry Zhang <jerry@xxxxxxxxxx> wrote:
>
> On Tue, Jul 27, 2021 at 9:30 PM Junio C Hamano <gitster@xxxxxxxxx> wrote:
> >
> > Jerry Zhang <jerry@xxxxxxxxxx> writes:
> >
> > > Binary patches applied with "--3way" will
> > > always return a conflict even if the patch
> > > should cleanly apply because the low level
> > > merge function considers all binary merges
> > > without a variant to be conflicting.
> > >
> > > Fix by falling back to normal patch application
> > > for all binary patches.
> > >
> > > Add tests for --3way and normal applications
> > > of binary patches.
> > >
> > > Fixes: 923cd87ac8 ("git-apply: try threeway first when "--3way" is used")
> > > Signed-off-by: Jerry Zhang <jerry@xxxxxxxxxx>
> > > ---
> > >  apply.c                   |  3 ++-
> > >  t/t4108-apply-threeway.sh | 45 +++++++++++++++++++++++++++++++++++++++
> > >  2 files changed, 47 insertions(+), 1 deletion(-)
> > >
> > > diff --git a/apply.c b/apply.c
> > > index 1d2d7e124e..78e52f0dc1 100644
> > > --- a/apply.c
> > > +++ b/apply.c
> > > @@ -3638,7 +3638,8 @@ static int apply_data(struct apply_state *state, struct patch *patch,
> > >       if (load_preimage(state, &image, patch, st, ce) < 0)
> > >               return -1;
> > >
> > > -     if (!state->threeway || try_threeway(state, &image, patch, st, ce) < 0) {
> > > +     if (!state->threeway || patch->is_binary ||
> > > +             try_threeway(state, &image, patch, st, ce) < 0) {
> >
> > Thanks for a quick turnaround.  However.
> >
> > Because apply.c::three_way_merge() calls into ll_merge() that lets
> > the low-level custom merge drivers to take over the actual merge, I
> > do not think your "if binary, bypass and never call try_threway() at
> > all" is the right solution.  The custom merge driver user uses for
> > the path may successfully perform such a "trivial" three-way merge
> > and return success.
> I understand now, thanks for the explanation
> >
> > Why does the current code that lets threeway tried first fails to
> > fall back to direct application?  The code before your change, if
> > fed a binary patch that does not apply, would have failed the direct
> > application first *and* then fell back to the threeway (if only to
> > fail because we do not let binary files be merged), no?
> >
> > Is it that try_threeway()'s way to express failure slightly
> > different from how direct application reports failure, but your
> > change used the same "only if it is negative, we fail and fallback"
> > logic?  IIRC, apply_fragments() which is the meat of the direct
> > application logic reports failures by negative, but try_threeway()
> > can return positive non-zero to signal a "recoverable" failure (aka
> > "conflicted merge").  Which should lead us to explore a different
> > approach, which is ...
> >
> >     Would it be possible for a patch to leave conflicts when
> >     try_threeway() was attempted, but will cleanly apply if direct
> >     application is done?
> >
> > If so, perhaps
> >
> >  - we first run try_threeway() and see if it cleanly resolves; if
> >    so, we are done.
> >
> >  - then we try direct application and see if it cleanly applies; if
> >    so, we are done.
> >
> >  - finally we run try_threeway() again and let it fail with
> >    conflict.
> >
> > might be the right sequence?  We theoretically could omit the first
> > of these three steps, but that would mean we'd write 923cd87a
> > (git-apply: try threeway first when "--3way" is used, 2021-04-06)
> > off as a failed experiment and revert it, which would not be ideal.
> >
> >
> > Also, independent from this "if we claim we try threeway first and
> > fall back to direct application, we really should do so" fix we are
> > discussing, I think our default binary merge can be a bit more
> > lenient and resolve this particular case of applying the binary
> > patch taken from itself (i.e. a patch that takes A to B gets applied
> > using --3way option to A).  I wonder if it can be as simple as the
> > attached patch.  FWIW, this change is sufficient (without the change
> > to apply.c we are reviewing here) to make your new tests in t4108
> > pass.
> So basically, another way of stating the problem would be that binary
> patches can apply cleanly with direct application in some cases where
> merge application is not clean. If i understand correctly this is unique
> to binary files, although it would be possible for a user to supply a custom
> merge driver for text files that is worse than direct application, that is
> most likely heavy user error that we shouldn't have to cater to. However
> the issue with binary is that the *default* merge driver is actually worse
> than direct application (in some cases). Therefore our options are
>
> 1. do as you suggest and run 3way -> direct -> 3way. I would modify
> this and say we should only attempt this for binary patches, since a text
> file that fails 3way would most likely also fail direct, so it would be a waste
> of time to try it. furthermore if we cache results from the first 3way and
> return them after attempting direct, it can save us from having to compute
> the 3way twice, so would be no worse than our current performance.
>
> 2. improve the default binary merge driver to be at least as good as direct
> application. this would allow us to say overall that "merge drivers should
> be at least as intelligent as direct patch application" and would greatly
> simplify logic in apply.c. Your change is a good first step in allowing it
> to handle more cases. A trivial way to make the binary merge driver
> at least as good as patch application is to generate a patch and apply
> it as part of the merge. I imagine this would have other consequences
> though as many parts of git use the binary merge driver.

Hmm I would have thought that binary patches allow context, similar to this
test snippet
"
 test_expect_success 'apply complex binary file patch' '
     git reset --hard main &&
     cp $TEST_DIRECTORY/test-binary-1.png bin.png &&
     git add bin.png &&
     git commit -m "add binary file" &&

     echo 1 >>bin.png &&
     git diff --binary >bin.diff &&
     git reset --hard &&

     cat $TEST_DIRECTORY/test-binary-2.png
$TEST_DIRECTORY/test-binary-1.png >bin.png &&
     git add bin.png &&
     git commit -m "change binary file" &&

     # Apply must succeed.
     git apply bin.diff
 '
"
but upon running it I see that normal patch application still requires the
preimage to match exactly.
"
error: the patch applies to 'bin.png'
(836481bd1b9b6bd7a1bb8939cf4ea01e05946850), which does not match the
current contents.
error: bin.png: patch does not apply
"

So at least in regards to making the default binary merge driver "at
least as intelligent" as
direct patch application, your patch ought to do it.

>
> Separately I think it would be a worthwhile follow-up patch to also handle
> trivial three-way merges in try_threeway(). This would:
> 1. Allow us to compare oid instead of the entire file buffer, which would be
> faster.
> 2. Handle trivial merges of all file types, which would save time.
>
> >
> > ---- >8 ------- >8 ------- >8 ------- >8 ------- >8 ------- >8 ----
> > Subject: ll-merge: teach ll_binary_merge() a trivial three-way merge
> >
> > The low-level binary merge code assumed that the caller will not
> > feed trivial merges that would have been resolved at the tree level;
> > because of this, ll_binary_merge() assumes the ancestor is different
> > from either side, always failing the merge in conflict unless -Xours
> > or -Xtheirs is in effect.
> >
> > But "git apply --3way" codepath could ask us to perform three-way
> > merge between two binaries A and B using A as the ancestor version.
> > The current code always fails such an application, but when given a
> > binary patch that turns A into B and asked to apply it to A, there
> > is no reason to fail such a request---we can trivially tell that the
> > result must be B.
> >
> > Arguably, this fix may belong to one level higher at ll_merge()
> > function, which dispatches to lower-level merge drivers, possibly
> > even before it renormalizes the three input buffers.  But let's
> > first see how this goes.
> >
> > Signed-off-by: Junio C Hamano <gitster@xxxxxxxxx>
> > ---
> >  ll-merge.c | 56 +++++++++++++++++++++++++++++++++++++++-----------------
> >  1 file changed, 39 insertions(+), 17 deletions(-)
> >
> > diff --git c/ll-merge.c w/ll-merge.c
> > index 261657578c..bc8038d404 100644
> > --- c/ll-merge.c
> > +++ w/ll-merge.c
> > @@ -46,6 +46,13 @@ void reset_merge_attributes(void)
> >         merge_attributes = NULL;
> >  }
> >
> > +static int same_mmfile(mmfile_t *a, mmfile_t *b)
> > +{
> > +       if (a->size != b->size)
> > +               return 0;
> > +       return !memcmp(a->ptr, b->ptr, a->size);
> > +}
> > +
> >  /*
> >   * Built-in low-levels
> >   */
> > @@ -58,9 +65,18 @@ static int ll_binary_merge(const struct ll_merge_driver *drv_unused,
> >                            const struct ll_merge_options *opts,
> >                            int marker_size)
> >  {
> > +       int status;
> >         mmfile_t *stolen;
> >         assert(opts);
> >
> > +       /*
> > +        * With -Xtheirs or -Xours, we have cleanly merged;
> > +        * otherwise we got a conflict, unless 3way trivially
> > +        * resolves.
> > +        */
> > +       status = (opts->variant == XDL_MERGE_FAVOR_OURS ||
> > +                 opts->variant == XDL_MERGE_FAVOR_THEIRS) ? 0 : 1;
> > +
> >         /*
> >          * The tentative merge result is the common ancestor for an
> >          * internal merge.  For the final merge, it is "ours" by
> > @@ -68,18 +84,30 @@ static int ll_binary_merge(const struct ll_merge_driver *drv_unused,
> >          */
> >         if (opts->virtual_ancestor) {
> >                 stolen = orig;
> > +               status = 0;
> >         } else {
> > -               switch (opts->variant) {
> > -               default:
> > -                       warning("Cannot merge binary files: %s (%s vs. %s)",
> > -                               path, name1, name2);
> > -                       /* fallthru */
> > -               case XDL_MERGE_FAVOR_OURS:
> > -                       stolen = src1;
> > -                       break;
> > -               case XDL_MERGE_FAVOR_THEIRS:
> > +               if (same_mmfile(orig, src1)) {
> >                         stolen = src2;
> > -                       break;
> > +                       status = 0;
> > +               } else if (same_mmfile(orig, src2)) {
> > +                       stolen = src1;
> > +                       status = 0;
> > +               } else if (same_mmfile(src1, src2)) {
> > +                       stolen = src1;
> > +                       status = 0;
> > +               } else {
> > +                       switch (opts->variant) {
> > +                       default:
> > +                               warning("Cannot merge binary files: %s (%s vs. %s)",
> > +                                       path, name1, name2);
> > +                               /* fallthru */
> > +                       case XDL_MERGE_FAVOR_OURS:
> > +                               stolen = src1;
> > +                               break;
> > +                       case XDL_MERGE_FAVOR_THEIRS:
> > +                               stolen = src2;
> > +                               break;
> > +                       }
> >                 }
> >         }
> >
> > @@ -87,13 +115,7 @@ static int ll_binary_merge(const struct ll_merge_driver *drv_unused,
> >         result->size = stolen->size;
> >         stolen->ptr = NULL;
> >
> > -       /*
> > -        * With -Xtheirs or -Xours, we have cleanly merged;
> > -        * otherwise we got a conflict.
> > -        */
> > -       return opts->variant == XDL_MERGE_FAVOR_OURS ||
> > -              opts->variant == XDL_MERGE_FAVOR_THEIRS ?
> > -              0 : 1;
> > +       return status;
> >  }
> >
> >  static int ll_xdl_merge(const struct ll_merge_driver *drv_unused,



[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux