Re: [PATCH v2 7/8] diff: add ability to insert additional headers for paths

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Tue, Dec 28, 2021 at 2:57 AM Johannes Altmanninger <aclopte@xxxxxxxxx> wrote:
>
> On Sat, Dec 25, 2021 at 07:59:18AM +0000, Elijah Newren via GitGitGadget wrote:
> > From: Elijah Newren <newren@xxxxxxxxx>
> >
> > When additional headers are provided, we need to
> >   * add diff_filepairs to diff_queued_diff for each paths in the
> >     additional headers map which, unless that path is part of
> >     another diff_filepair already found in diff_queued_diff
> >   * format the headers (colorization, line_prefix for --graph)
> >   * make sure the various codepaths that attempt to return early
> >     if there are "no changes" take into account the headers that
> >     need to be shown.
> >
> > Signed-off-by: Elijah Newren <newren@xxxxxxxxx>
> > ---
> >  diff.c     | 116 +++++++++++++++++++++++++++++++++++++++++++++++++++--
> >  diff.h     |   3 +-
> >  log-tree.c |   2 +-
> >  3 files changed, 115 insertions(+), 6 deletions(-)
> >
> > diff --git a/diff.c b/diff.c
> > index 861282db1c3..aaa6a19f158 100644
> > --- a/diff.c
> > +++ b/diff.c
> > @@ -27,6 +27,7 @@
> >  #include "help.h"
> >  #include "promisor-remote.h"
> >  #include "dir.h"
> > +#include "strmap.h"
> >
> >  #ifdef NO_FAST_WORKING_DIRECTORY
> >  #define FAST_WORKING_DIRECTORY 0
> > @@ -3406,6 +3407,31 @@ struct userdiff_driver *get_textconv(struct repository *r,
> >       return userdiff_get_textconv(r, one->driver);
> >  }
> >
> > +static struct strbuf *additional_headers(struct diff_options *o,
> > +                                      const char *path)
> > +{
> > +     if (!o->additional_path_headers)
> > +             return NULL;
> > +     return strmap_get(o->additional_path_headers, path);
> > +}
> > +
> > +static void add_formatted_headers(struct strbuf *msg,
> > +                               struct strbuf *more_headers,
> > +                               const char *line_prefix,
> > +                               const char *meta,
> > +                               const char *reset)
> > +{
> > +     char *next, *newline;
> > +
> > +     for (next = more_headers->buf; *next; next = newline) {
> > +             newline = strchrnul(next, '\n');
> > +             strbuf_addf(msg, "%s%s%.*s%s\n", line_prefix, meta,
> > +                         (int)(newline - next), next, reset);
> > +             if (*newline)
> > +                     newline++;
> > +     }
> > +}
> > +
> >  static void builtin_diff(const char *name_a,
> >                        const char *name_b,
> >                        struct diff_filespec *one,
> > @@ -3464,6 +3490,17 @@ static void builtin_diff(const char *name_a,
> >       b_two = quote_two(b_prefix, name_b + (*name_b == '/'));
> >       lbl[0] = DIFF_FILE_VALID(one) ? a_one : "/dev/null";
> >       lbl[1] = DIFF_FILE_VALID(two) ? b_two : "/dev/null";
> > +     if (!DIFF_FILE_VALID(one) && !DIFF_FILE_VALID(two)) {
> > +             /*
> > +              * We should only reach this point for pairs from
> > +              * create_filepairs_for_header_only_notifications().  For
> > +              * these, we should avoid the "/dev/null" special casing
> > +              * above, meaning we avoid showing such pairs as either
> > +              * "new file" or "deleted file" below.
> > +              */
> > +             lbl[0] = a_one;
> > +             lbl[1] = b_two;
> > +     }
>
> not so familiar with this logic, but I saw that without this change, the
> rename/rename conflict test fails. Is this because we add a file pair under
> the original name (that's been renamed on both sides). I wonder if we
> can sketch such a case in the comment.

That may be the only current test in the testsuite that fails without
this bit of logic, but I don't want the comment to be specific to the
rename/rename case.  Whenever we have a conflict/warning/whatever
message from the merge machinery tied to a path which doesn't show up
in either the automatic merge or the recorded merge commit, we will
hit this situation.  Even if I were to give a complete listing of all
the current cases, more could be added in the future.

> > +static void create_filepairs_for_header_only_notifications(struct diff_options *o)
> > +{
> > +     struct strset present;
> > +     struct diff_queue_struct *q = &diff_queued_diff;
> > +     struct hashmap_iter iter;
> > +     struct strmap_entry *e;
> > +     int i;
> > +
> > +     strset_init_with_options(&present, /*pool*/ NULL, /*strdup*/ 0);
> > +
> > +     /*
> > +      * Find out which paths exist in diff_queued_diff, preferring
> > +      * one->path for any pair that has multiple paths.
>
> Why do we prefer one->path?

run_diff() sets name = one->path, passes it along to run_diff_cmd(),
and from there it goes to fill_metainfo() and either
run_external_diff() or builtin_diff().

I'm wondering if I should just ignore two->path entirely and only use
one->path; I think I partially looked at both because of various
places in diff.c that already do but give preferential treatment to
one->path (diffnamecmp(), the calls to show_submodule*diff*(), what is
passed to write_name_quoted() in diff_flush_raw()).

> > +      */
> > +     for (i = 0; i < q->nr; i++) {
> > +             struct diff_filepair *p = q->queue[i];
> > +             char *path = p->one->path ? p->one->path : p->two->path;
> > +
> > +             if (strmap_contains(o->additional_path_headers, path))
> > +                     strset_add(&present, path);
> > +     }
> > +
> > +     /*
> > +      * Loop over paths in additional_path_headers; for each NOT already
> > +      * in diff_queued_diff, create a synthetic filepair and insert that
> > +      * into diff_queued_diff.
> > +      */
> > +     strmap_for_each_entry(o->additional_path_headers, &iter, e) {
> > +             if (!strset_contains(&present, e->key)) {
> > +                     struct diff_filespec *one, *two;
> > +                     struct diff_filepair *p;
> > +
> > +                     one = alloc_filespec(e->key);
> > +                     two = alloc_filespec(e->key);
> > +                     fill_filespec(one, null_oid(), 0, 0);
> > +                     fill_filespec(two, null_oid(), 0, 0);
> > +                     p = diff_queue(q, one, two);
> > +                     p->status = DIFF_STATUS_MODIFIED;
> > +             }
> > +     }
>
> All these string hash-maps are not really typical for a C program. I'm sure
> they are the best choice for an advanced merge algorithm

Agreed up to here.

> but they are not
> really necessary for computing/printing a diff.

Technically agree that it _could_ be solved a different way, but the
strmaps are a much more natural solution to this problem in this
particular case; more on this below.

> It feels like this is an
> implementation detail from merge-ort that's leaking into other components.

And I disagree here, on _both_ the explicit point and the underlying
suggestion that you seem to be making that strmap should be avoided
outside of merging.  The strmap.[ch] type was originally a suggestion
from Peff for areas of git completely unrelated to merging (see the
beginning of https://lore.kernel.org/git/20200821194857.GD1165@xxxxxxxxxxxxxxxxxxxxxxx/,
and the first link in that email).  It's a new datatype for git, much
like strbuf or string_list or whatever before it, that is there to be
used when it's a natural fit for the problem at hand.  The lack of
strmap previously led folks to abuse other existing data structures
(and in a way that often led to poor performance to boot).

> What we want to do is
>
>         for file_pair in additional_headers:
>                 if not already_queued(file_pair):
>                         queue(file_pair)

Yes, precisely.

> to do that, you use a temporary has-set ("present") that records everything
> that's already queued (already_queued() is a lookup in that set).
>
> Let's assume both the queue and additional_headers are sorted arrays.

That's a bad assumption; we can't rely on *either* being sorted.  I
actually started my implementation by trying exactly what you mention
first; I too thought it'd be more natural and clearer to do this.  Of
course, before implementing it, I had to verify whether
diff_queued_diff was sorted.  So, I added some code that would check
the order and fail if the queue wasn't sorted.  7 of the test files in
the regression testsuite had one or more failing tests.

I think the queue was intended to be sorted (see
diffcore_fix_diff_index()), but in practice it's not.  And I'm worried
that if I find the current cases where it fails to be sorted and "fix"
them (though I don't actually know if this was intentional or not so I
don't know if that's really a fix or a break), that I'd end up with
additional cases in the future where they fail to be sorted anyway.
So, no matter what, relying on diff_queued_diff being sorted seems
ill-advised.

Also...

> Then we could efficiently merge them (like a merge-sort algorithm)
> without ever allocating a temporary hash map.
>
> I haven't checked if this is practical (better wait for feedback).
> We'd probably need to convert the strmap additional_path_headers into an
> array and sort it (I guess our hash map does not guarantee any ordering?)

Right, strmap has no ordering either.  I was willing to stick those
into a string_list and sort them, but making temporary copies of both
the strmap and the diff_queued_diff just to sort them so that I can
reasonably cheaply ask "are items from this thing present in this
other thing?" seems to be stretching things a bit too far.
maps/hashes provide a very nice "is this item present" lookup and are
a natural way to ask that.  Since that is exactly the question I am
asking, I think they are the better data structure here.  So, this was
not at all a leak of merge-ort datastructures, but rather a picking of
the appropriate data structures for the problem at hand.

> > +
> > +     /* Re-sort the filepairs */
> > +     diffcore_fix_diff_index();
> > +
> > +     /* Cleanup */
> > +     strset_clear(&present);
>
> Not a strong opinion, but I'd probably drop this comment
>
> > +}
> > +
> >  static void diff_flush_patch_all_file_pairs(struct diff_options *o)
> >  {
> >       int i;
> > @@ -6337,6 +6442,9 @@ static void diff_flush_patch_all_file_pairs(struct diff_options *o)
> >       if (o->color_moved)
> >               o->emitted_symbols = &esm;
> >
> > +     if (o->additional_path_headers)
> > +             create_filepairs_for_header_only_notifications(o);
> > +
> >       for (i = 0; i < q->nr; i++) {
> >               struct diff_filepair *p = q->queue[i];
> >               if (check_pair_status(p))
> > @@ -6413,7 +6521,7 @@ void diff_flush(struct diff_options *options)
> >        * Order: raw, stat, summary, patch
> >        * or:    name/name-status/checkdiff (other bits clear)
> >        */
> > -     if (!q->nr)
> > +     if (!q->nr && !options->additional_path_headers)
> >               goto free_queue;
> >
> >       if (output_format & (DIFF_FORMAT_RAW |
> > diff --git a/diff.h b/diff.h
> > index 8ba85c5e605..06a0a67afda 100644
> > --- a/diff.h
> > +++ b/diff.h
> > @@ -395,6 +395,7 @@ struct diff_options {
> >
> >       struct repository *repo;
> >       struct option *parseopts;
> > +     struct strmap *additional_path_headers;
> >
> >       int no_free;
> >  };
> > @@ -593,7 +594,7 @@ void diffcore_fix_diff_index(void);
> >  "                show all files diff when -S is used and hit is found.\n" \
> >  "  -a  --text    treat all files as text.\n"
> >
> > -int diff_queue_is_empty(void);
> > +int diff_queue_is_empty(struct diff_options*);
> >  void diff_flush(struct diff_options*);
> >  void diff_free(struct diff_options*);
> >  void diff_warn_rename_limit(const char *varname, int needed, int degraded_cc);
> > diff --git a/log-tree.c b/log-tree.c
> > index d4655b63d75..33c28f537a6 100644
> > --- a/log-tree.c
> > +++ b/log-tree.c
> > @@ -850,7 +850,7 @@ int log_tree_diff_flush(struct rev_info *opt)
> >       opt->shown_dashes = 0;
> >       diffcore_std(&opt->diffopt);
> >
> > -     if (diff_queue_is_empty()) {
> > +     if (diff_queue_is_empty(&opt->diffopt)) {
> >               int saved_fmt = opt->diffopt.output_format;
> >               opt->diffopt.output_format = DIFF_FORMAT_NO_OUTPUT;
> >               diff_flush(&opt->diffopt);
> > --
> > gitgitgadget
> >



[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux