Re: [PATCH 1/3] merge-tree -z: always show the original file name first

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi all (and I guess in particular Elijah),

On Fri, 19 Aug 2022, Johannes Schindelin via GitGitGadget wrote:

> diff --git a/t/t4069-remerge-diff.sh b/t/t4069-remerge-diff.sh
> index 35f94957fce..bc580a242ac 100755
> --- a/t/t4069-remerge-diff.sh
> +++ b/t/t4069-remerge-diff.sh
> @@ -131,11 +131,12 @@ test_expect_success 'setup non-content conflicts' '
>  test_expect_success 'remerge-diff with non-content conflicts' '
>  	git log -1 --oneline resolution >tmp &&
>  	cat <<-EOF >>tmp &&
> +	diff --git a/file_or_directory b/file_or_directory
> +	remerge CONFLICT (file/directory): directory in the way of file_or_directory from HASH (side1); moving it to file_or_directory~HASH (side1) instead.
>  	diff --git a/file_or_directory~HASH (side1) b/wanted_content
>  	similarity index 100%
>  	rename from file_or_directory~HASH (side1)
>  	rename to wanted_content
> -	remerge CONFLICT (file/directory): directory in the way of file_or_directory from HASH (side1); moving it to file_or_directory~HASH (side1) instead.
>  	diff --git a/letters b/letters
>  	remerge CONFLICT (rename/rename): letters renamed to letters_side1 in HASH (side1) and to letters_side2 in HASH (side2).
>  	diff --git a/letters_side2 b/letters_side2

I would have liked to avoid touching this file altogether, of course, but
when I investigated, I came to the conclusion that while this patch still
does not produce the best possible outcome for remerge diffs, it does
improve upon the current situation.

The thing is, when mentioning that we had to rename `file_or_directory` to
`file_or_directory~HASH (side1)` to be able to write the file because a
directory of the same name already was in the way, I would actually have
expected this to come under the diff header

	diff --git a/file_or_directory b/file_or_directory~HASH (side1)

Previously, it was under the header

	diff --git a/file_or_directory~HASH (side1) b/wanted_content

and with this patch it is under

	diff --git a/file_or_directory b/file_or_directory

which is still not ideal: It mentions only the original file name, not the
munged one.

When I looked into the implementation details I figured out that the
information is not quite at our fingertips. In
[`create_filepairs_for_header_only_notifications()`][create_filepairs], we
have access to the primary filename (in the above example,
`file_or_directory`) as the key of the entry in the strmap
`o->additional_path_headers`, which is [populated from
`path_messages`][additional-headers], which in turn [points to the
`conflicts` strmap][path_messages]. Note: this code has changed between
v2.37.2 and the current main branch, in v2.37.2 the information is still
copied from `conflicts` to `path_messages`.

So basically, when we generate that diff header, we know the primary path
(great!) and we have access to a strmap that points to a string list of
conflict messages. So from where do we get that munged path name? The
string list of conflict messages [has `util` pointers to `struct
logical_conflict_info` data][conflicts-strmap], which does contain the
list of involved paths. This should be enough, right? But:

- `struct logical_conflict_info` is declared locally in `merge-ort.c`, we
  do not have access to that information in `diff.c` (nor would we really
  _want_ to let that `merge-ort` implementation detail spill over into
  `diff.c`), and even if we _could_ declare it in `merge-ort.h`, there is
  still this problem:

- The path list in `struct logical_conflict_info` is arbitrary-sized, i.e.
  we could even have _three_ paths in there, and that does not fit into a
  standard diff header.

- Additionally, we generate _one_ diff header for _all_ of the conflict
  messages listed for a particular primary path in
  `additional_path_headers`. But each of these conflict messages relates
  to its very own list of non-primary paths. For example, one conflict
  message could talk about a file/directory conflict, another one could
  talk about conflicting renames to two different filenames (in a
  recursive merge, both is possible in the same merge). It is simply
  impossible to combine all of those conflict messages under a single
  diff header that matches all of the involved paths.

- Even aside from that, the remerge diff code seems to have a usability
  wart (or even bug) where restricting the diff via a pathspec that
  matches a secondary path but not the primary path of a conflict (think:
  renames) would _not_ show the conflict message to the user, it would be
  shown only if the pathspec also includes the primary path.

In general, I found it hard to think of a _really_ ideal design where to
present that remerge conflict information. After all, in a remerge diff,
we do not even write out any files, therefore that conflict message
suggesting that we renamed the file to a munged name is somewhat
misleading. Sure, we should mention the conflict between the file and the
directory, but since nothing was written to disk, there is not actually
any _need_ to mention a munged filename at all. And that goes even for the
part of the remerge diff shown in the patch quoted above, where it talks
about `file_or_directory~HASH (side1)` being renamed... That file never
had that filename...

So I am not quite sure where we want the remerge design to go from here.

In any case, this remerge aspect is safely outside the scope of this here
patch series, which means that this here patch series should not be
concerned with it. In this email, I just wanted to mention _why_ I did not
include any patch to address the remerge issues any further.

Ciao,
Dscho

[create_filepairs]: https://github.com/git/git/blob/v2.37.2/diff.c#L6337-L6383
[additional-headers]: https://github.com/git/git/blob/v2.37.2/log-tree.c#L992
[path_messages]: https://github.com/git/git/blob/795ea8776be/merge-ort.c#L4846
[conflicts-strmap]: https://github.com/git/git/blob/795ea8776be/merge-ort.c#L351-L360



[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux