Re: [PATCH v2 01/20] merge-ort: setup basic internal data structures

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Fri, Nov 6, 2020 at 2:05 PM Jonathan Tan <jonathantanmy@xxxxxxxxxx> wrote:
>
> I'm not very familiar with the merge machinery, but I'll attempt a
> review of at least the first 10 patches.

Thanks for taking a look!

> > merged_info contains all relevant information for a non-conflicted
> > entry.  conflict_info contains a merged_info, plus any additional
> > information about a conflict such as the higher orders stages involved
> > and the names of the paths those came from (handy once renames get
> > involved).  If an entry remains conflicted, the merged_info portion of a
> > conflict_info will later be filled with whatever version of the file
> > should be placed in the working directory (e.g. an as-merged-as-possible
> > variation that contains conflict markers).
>
> I think that this information should be in the .c file.

Okay.

> > diff --git a/merge-ort.c b/merge-ort.c
> > index b487901d3e..9d5ea0930d 100644
> > --- a/merge-ort.c
> > +++ b/merge-ort.c
> > @@ -17,6 +17,46 @@
> >  #include "cache.h"
> >  #include "merge-ort.h"
> >
> > +#include "strmap.h"
> > +
> > +struct merge_options_internal {
> > +     struct strmap paths;    /* maps path -> (merged|conflict)_info */
> > +     struct strmap unmerged; /* maps path -> conflict_info */
> > +     const char *current_dir_name;
> > +     int call_depth;
> > +};
>
> Maybe document if the path is from the root of the directory or just the
> filename as it appears in a tree object?

Yeah, full relative path from toplevel.  I'll try to add some
documentation to that effect.

> I would have expected "unmerged" to be a "strset", but I guess it's more
> convenient for it to be a map itself. Maybe just document it as "all
> mappings in paths wherein the value is a struct conflict_info".

Makes sense.  And yeah, it's not a strset just because of the simple
optimization to avoid needing to do another lookup in paths afterwards
to get the actual conflict_info; the only time it is used is as a loop
over the still-unmerged entries to try to three-way merge them and
such.

> There seems to be 2 ways of referring to something that we couldn't
> merge - "conflicted" (or "having a conflict") and "unmerged". Should we
> stick to one of them?

Uhm...perhaps, but it feels like I'm going to miss some while looking
over it.  Also, there are some semantic differences at play.  Since
the whole algorithm is divided around multiple stages --
collect_merge_info(), detect_and_process_renames(), process_entries(),
as of a given early stage we might just know that we couldn't merge it
*yet*.  For example, both sides modified the entry, or one side
modified and the other side is missing ("did they delete it or rename
it?").  After rename detection and three-way content merge, something
that had not been automatically mergeable as of an earlier step might
end up being so.  But we need names for stuff in the interim state.
But it's also possible for us to know at an early state that thing are
definitely going to be a conflict regardless of what later stages do
(e.g. both sides rename a path, but rename it differently).

> Also, looking ahead, I see that current_dir_name is used as a temporary
> variable in the recursive calls to collect_merge_info_callback(). I
> would prefer if current_dir_name went in the cbdata to that function
> instead, but if that's not possible, maybe document here that
> current_dir_name is only used in collect_merge_info_callback(), and
> temporarily at that.

Yeah, not possible.  collect_merge_info_callback() has to be of
traverse_callback_t type (from tree-walk.h), which provides no extra
parameters for extra callback data.  I can add a documentation
comment.

> > +struct version_info {
> > +     struct object_id oid;
> > +     unsigned short mode;
> > +};
>
> OK.
>
> > +struct merged_info {
> > +     struct version_info result;
> > +     unsigned is_null:1;
> > +     unsigned clean:1;
> > +     size_t basename_offset;
> > +      /*
> > +       * Containing directory name.  Note that we assume directory_name is
> > +       * constructed such that
> > +       *    strcmp(dir1_name, dir2_name) == 0 iff dir1_name == dir2_name,
> > +       * i.e. string equality is equivalent to pointer equality.  For this
> > +       * to hold, we have to be careful setting directory_name.
> > +       */
> > +     const char *directory_name;
> > +};
>
> I'm not sure how most of the fields in this struct are to be used, but
> perhaps that will be clearer once I read the subsequent code.
>
> > +struct conflict_info {
> > +     struct merged_info merged;
> > +     struct version_info stages[3];
> > +     const char *pathnames[3];
>
> Why would these be different across stages? (Rename detection?)

Yes, as noted in the portion of the commit message that you said you
wanted in the .c file.

>
> > +     unsigned df_conflict:1;
>
> OK.
>
> > +     unsigned path_conflict:1;
>
> This doesn't seem to be assigned anywhere in this patch set?

Oh, right, I could drop it for now and add it back later when it is
used.  It's basically non-content conflict other than the specialized
D/F conflict.  So, things like rename/delete, moved by directory
rename, rename/rename(1to2), and rename/add/delete.  I could have
potentially lumped it in with df_conflict or made df_conflict a
subset, but df_conflict is different enough from the others that it
got a special flag.

But yeah, since it's all rename-related stuff and this patchset
doesn't have any rename handling yet, I probably should have left it
out until I added that stuff later.

> > +     unsigned filemask:3;
> > +     unsigned dirmask:3;
>
> I wonder if this needs to be documented that the least significant bit
> corresponds to stages[0], and so forth.

Maybe I should just put a comment to look at tree-walk.h?  The struct
traverse_info has a "fn" member with a big comment above it describing
mask & dirmask; filemask is just mask & ~dirmask.

> > +     unsigned match_mask:3;
>
> I think this can be derived by just looking at the stages array? Maybe
> document as:
>
>   Optimization to track which stages match. Either 0 or at least 2 bits
>   are set; if at least 2 bits are set, their corresponding stages match.

Yep, I only wanted to compute the match_mask once (I always got
annoyed in merge-recursive.c that we were re-comparing what had
already been compared and computed within unpack_trees()).  I like
your suggested comment; will add it.



[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux