Re: [PATCH v2 08/20] merge-ort: compute a few more useful fields for collect_merge_info

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



> On Fri, Nov 6, 2020 at 2:52 PM Jonathan Tan <jonathantanmy@xxxxxxxxxx> wrote:
> >
> > > +     /*
> > > +      * Note: We only label files with df_conflict, not directories.
> > > +      * Since directories stay where they are, and files move out of the
> > > +      * way to make room for a directory, we don't care if there was a
> > > +      * directory/file conflict for a parent directory of the current path.
> > > +      */
> > > +     unsigned df_conflict = (filemask != 0) && (dirmask != 0);
> >
> > Suppose you have:
> >
> >  [ours]
> >   foo/
> >     bar/
> >       baz
> >     quux
> >  [theirs]
> >   foo
> >
> > By "we only label files with df_conflict, not directories", are you
> > referring to not labelling "foo/" in [ours], or to "bar/", "baz", and
> > "quux" (so, the files and directories within a directory)? At first I
> > thought you were referring to the former, but perhaps you are referring
> > to the latter.
> 
> The former.  I was drawing a distinction between how this code
> operates, and how unpack_trees() operates, which probably only matters
> to those familiar with unpack_trees() or who have been reading through
> it recently.

Just for clarification: do you mean "the latter"? (The "not" in my
question might be confusing.)

To be more illustrative in what I meant, at first I thought that you
were NOT labelling "foo/" in [ours], hence:

 [ours]
  foo/  <- unlabeled
 [theirs]
  foo   <- labeled

In this way, in a sense, you are indeed labelling only the file, not the
directory.

But instead what you seem to be doing is this:

 [ours]
  foo/     <- labeled
    bar/   <- unlabeled
      baz  <- unlabeled
    quux   <- unlabeled
 [theirs]
  foo      <- labeled

which is what I meant by NOT labelling "bar/", "baz", and "quux".

> unpack_trees() will note when there is a directory/file
> conflict, and propagates that information to all subtrees, with every
> path specially checking for the o->df_conflict_entry and then handling
> it specially (e.g. keeping higher order stages instead of using an
> aggressive or trivial resolutions).

And here it seems like you're describing that unpack_trees() would label
it in this way:

 [ours]
  foo/     <- labeled
    bar/   <- labeled
      baz  <- labeled
    quux   <- labeled
 [theirs]
  foo      <- labeled

(and you're emphasizing by contrast that merge-ort is NOT doing this).

> However, leaving both a file and
> a directory at the same path, while allowed in the index, makes for
> ugliness and difficulty for users to resolve.   Plus it isn't allowed
> in the working tree anyway.  We decided a while ago that it'd be
> better to represent these conflicts differently[1], [2].
> 
> Also, at the time you are unpacking or traversing trees, you only know
> if one side had a directory where the other side had a file.  You
> don't know if the final merge result will actually have a
> directory/file conflict.  If the file existed in both the base version
> and unmodified on one side, for example, then the file will be removed
> as part of the merge.  It is similarly possible that the entire
> directory of files all need to be deleted or are all renamed
> elsewhere.  So, you have to keep track of a df_conflict bit, but you
> can't act on it until you've processed several other things first.
> 
> Since I already know I'm not going to move a whole directory of files
> out of the way so that a file can be placed in the working tree
> instead of that whole directory, the directory doesn't need to be
> tweaked.  I'm not going to propagate any information about a
> directory/file conflict at some path down to all subpaths of the
> directory.  I only track it for the file that immediately conflicts,
> and then only take action on it after resolving all the paths under
> the corresponding directory to see if the directory/file conflict
> remains.
> 
> [1] https://lore.kernel.org/git/xmqqbmabcuhf.fsf@xxxxxxxxxxxxxxxxxxxxxxxxx/
> and the thread surrounding it
> [2] https://lore.kernel.org/git/f27f12e8e50e56c010c29caa00296475d4de205b.1603731704.git.gitgitgadget@xxxxxxxxx/,
> which is now commit ef52778708 ("merge tests: expect improved
> directory/file conflict handling in ort", 2020-10-26)

Makes sense.

> > > @@ -161,6 +179,13 @@ static int collect_merge_info_callback(int n,
> > >               newinfo.name = p->path;
> > >               newinfo.namelen = p->pathlen;
> > >               newinfo.pathlen = st_add3(newinfo.pathlen, p->pathlen, 1);
> > > +             /*
> > > +              * If we did care about parent directories having a D/F
> > > +              * conflict, then we'd include
> > > +              *    newinfo.df_conflicts |= (mask & ~dirmask);
> > > +              * here.  But we don't.  (See comment near setting of local
> > > +              * df_conflict variable near the beginning of this function).
> > > +              */
> >
> > I'm not sure how "mask" and "dirmask" contains information about parent
> > directories. "mask" represents the available entries, and "dirmask"
> > represents which of them are directories, as far as I know. So we can
> > notice when something is missing, but I don't see how this distinguishes
> > between the case that something is missing because it was in a parent
> > directory that got deleted, vs something is missing because it itself
> > got deleted.
> 
> Yeah, this is more comparisons to unpack_trees.  This code is about to
> set up a recursive call into subdirectories.  newinfo is set based on
> the mask and dirmask of the current entry, and then subdirectories can
> consult newinfo.df_conflicts to see if that path is within a directory
> that was involved in a directory/file conflict.  For example:
> 
> Tree in base version:
>     foo/
>         bar
>     stuff.txt
> Tree on side 1: (adds foo/baz)
>     foo/
>         bar
>         baz
>     stuff.txt
> Tree on side 2: (deletes foo/, adds new file foo)
>    foo
>    stuff.txt
> 
> When processing 'foo', we have mask=7, dirmask = 3.  So, here
> unpack_trees() would have set newinfo.df_conflicts = (mask & ~dirmask)
> = 4.  Then when we process foo/bar or foo/baz, we have mask=2,
> dirmask=0, which looks like there are no directory/file conflicts.
> However, we can note that these paths are under a directory involved
> in a directory/file conflict via info.df_conflicts whose value is 4.
> unpack_trees() cared about paths under a directory that was involved
> in a directory/file conflict, and someone familiar with that code
> might ask why I don't also track the same information.  The answer is
> that I don't track it, even though I thought about it, because it's
> useless overhead since I'm going to leave the directory alone and move
> the file out of the way.
> 
> Does that make sense?

Ah...yes, that makes sense. I think I didn't notice the "newinfo", so I
didn't realize that we were setting the info of our children based on
ourselves. Perhaps I would have noticed it sooner if the comment had
read "If this file/directory cared about its parent directory (the
current directory) having a D/F conflict, then we'd propagate the masks
in this way:" instead of "If we did care about parent directories having
a D/F conflict", but perhaps the point is already obvious enough.



[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux