Re: [PATCH 02/11] merge-ort: add initial outline for basic rename detection

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi,

Sorry for two different email responses to the same email...

Addressing the comments on this patchset mean re-submitting
en/merge-ort-impl, and causing conflicts in en/merge-ort-2 and this
series en/merge-ort-3.  Since gitgitgadget will not allow me to submit
patches against a series that isn't published by Junio, I'll need to
ask Junio to temporarily drop both of these series, then later
resubmit en/merge-ort-2 after he publishes my updates to
en/merge-ort-impl.  Then when he publishes my updates to
en/merge-ort-2, I'll be able to submit my already-rebased patches for
en/merge-ort-3.

A couple extra comments below...

On Thu, Dec 10, 2020 at 6:39 PM Derrick Stolee <stolee@xxxxxxxxx> wrote:
>
> On 12/9/2020 2:41 PM, Elijah Newren via GitGitGadget wrote:
> > From: Elijah Newren <newren@xxxxxxxxx>
> >
> > Signed-off-by: Elijah Newren <newren@xxxxxxxxx>
> > ---
> >  merge-ort.c | 68 ++++++++++++++++++++++++++++++++++++++++++++++-------
> >  1 file changed, 60 insertions(+), 8 deletions(-)
> >
> > diff --git a/merge-ort.c b/merge-ort.c
> > index 90baedac407..92b765dd3f0 100644
> > --- a/merge-ort.c
> > +++ b/merge-ort.c
> > @@ -617,20 +617,72 @@ static int handle_content_merge(struct merge_options *opt,
> >
> >  /*** Function Grouping: functions related to regular rename detection ***/
> >
> > +static int process_renames(struct merge_options *opt,
> > +                        struct diff_queue_struct *renames)
> > +static int compare_pairs(const void *a_, const void *b_)
> > +/* Call diffcore_rename() to compute which files have changed on given side */
> > +static void detect_regular_renames(struct merge_options *opt,
> > +                                struct tree *merge_base,
> > +                                struct tree *side,
> > +                                unsigned side_index)
> > +static int collect_renames(struct merge_options *opt,
> > +                        struct diff_queue_struct *result,
> > +                        unsigned side_index)
>
> standard "I promise this will follow soon!" strategy, OK.
>
> >  static int detect_and_process_renames(struct merge_options *opt,
> >                                     struct tree *merge_base,
> >                                     struct tree *side1,
> >                                     struct tree *side2)
> >  {
> > -     int clean = 1;
> > +     struct diff_queue_struct combined;
> > +     struct rename_info *renames = opt->priv->renames;
>
> (Re: my concerns that we don't need 'renames' to be a pointer,
> this could easily be "renames = &opt->priv.renames;")
>
> > +     int s, clean = 1;
> > +
> > +     memset(&combined, 0, sizeof(combined));
> > +
> > +     detect_regular_renames(opt, merge_base, side1, 1);
> > +     detect_regular_renames(opt, merge_base, side2, 2);
>
> Find the renames in each side's diff.
>
> I think the use of "1" and "2" here might be better situated
> for an enum. Perhaps:
>
> enum merge_side {
>         MERGE_SIDE1 = 0,
>         MERGE_SIDE2 = 1,
> };
>
> (Note, I shift these values to 0 and 1, respectively, allowing
> us to truncate the pairs array to two entries while still
> being mentally clear.)

So, after mulling it over for a while, I created a

enum merge_side {
    MERGE_BASE = 0,
    MERGE_SIDE1 = 1,
    MERGE_SIDE2 = 2
};

and I made use of it in several places.  I just avoided going to an
extreme with it (e.g. adding another enum for masks or changing all
possibly relevant variables from ints to enum merge_side), and used it
more as a document-when-values-are-meant-to-refer-to-sides-of-the-merge
kind of thing.  Of course, this affects two previous patchsets and not
just this one, so I'll have to post a _lot_ of new patches...   :-)

> > +
> > +     ALLOC_GROW(combined.queue,
> > +                renames->pairs[1].nr + renames->pairs[2].nr,
> > +                combined.alloc);
> > +     clean &= collect_renames(opt, &combined, 1);
> > +     clean &= collect_renames(opt, &combined, 2);
>
> Magic numbers again.
>
> > +     QSORT(combined.queue, combined.nr, compare_pairs);
> > +
> > +     clean &= process_renames(opt, &combined);
>
> I need to mentally remember that "clean" is a return state,
> but _not_ a fail/success result. Even though we are using
> "&=" here, it shouldn't be "&&=" or even "if (method()) return 1;"
>
> Looking at how "clean" is used in struct merge_result, I
> wonder if there is a reason to use an "int" over a simple
> "unsigned" or even "unsigned clean:1;" You use -1 in places
> as well as a case of "mi->clean = !!resolved;"

Something I missed in my reply yesterday...

Note that mi->clean is NOT from struct merge_result.  It is from
struct merged_info, and in that struct it IS defined as "unsigned
clean:1", i.e. it is a true boolean.  The merged_info.clean field is
used to determine whether a specific path merged cleanly.

"clean" from struct merge_result is whether the entirety of the merge
was clean or not.  It's almost a boolean, but allows for a
"catastrophic problem encountered" value.  I added the following
comment:
/*
* Whether the merge is clean; possible values:
*    1: clean
*    0: not clean (merge conflicts)
*   <0: operation aborted prematurely.  (object database
*       unreadable, disk full, etc.)  Worktree may be left in an
*       inconsistent state if operation failed near the end.
*/

This also means that I either abort and return a negative value, or I
can continue treating merge_result's "clean" field as a boolean.

But again, this isn't new to this patchset; it affects the patchset
before the patchset before this one.

> If there is more meaning to values other than "clean" or
> "!clean", then an enum might be valuable.
>
> > +     /* Free memory for renames->pairs[] and combined */
> > +     for (s = 1; s <= 2; s++) {
> > +             free(renames->pairs[s].queue);
> > +             DIFF_QUEUE_CLEAR(&renames->pairs[s]);
> > +     }
>
> This loop is particularly unusual. Perhaps it would be
> better to do this instead:
>
>         free(renames->pairs[MERGE_SIDE1].queue);
>         free(renames->pairs[MERGE_SIDE2].queue);
>         DIFF_QUEUE_CLEAR(&renames->pairs[MERGE_SIDE1]);
>         DIFF_QUEUE_CLEAR(&renames->pairs[MERGE_SIDE2]);
>
> > +     if (combined.nr) {
> > +             int i;
> > +             for (i = 0; i < combined.nr; i++)
> > +                     diff_free_filepair(combined.queue[i]);
> > +             free(combined.queue);
> > +     }
> >
> > -     /*
> > -      * Rename detection works by detecting file similarity.  Here we use
> > -      * a really easy-to-implement scheme: files are similar IFF they have
> > -      * the same filename.  Therefore, by this scheme, there are no renames.
> > -      *
> > -      * TODO: Actually implement a real rename detection scheme.
> > -      */
> >       return clean;
>
> I notice that this change causes detect_and_process_renames() to
> change from an "unhelpful result, but success" to "die() always".
>
> I wonder if there is value in swapping the order of the patches
> to implement the static methods first. Of course, you hit the
> "unreferenced static method" problem, so maybe your strategy is
> better after all.
>
> Thanks,
> -Stolee



[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux