Re: [PATCH 03/18] branch-diff: first rudimentary implementation

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Thu, May 3, 2018 at 8:30 AM, Johannes Schindelin
<johannes.schindelin@xxxxxx> wrote:

> Note: due to differences in the diff algorithm (`tbdiff` uses the
> Pythong module `difflib`, Git uses its xdiff fork), the cost matrix
> calculated by `branch-diff` is different (but very similar) to the one
> calculated by `tbdiff`. Therefore, it is possible that they find
> different matching commits in corner cases (e.g. when a patch was split
> into two patches of roughly equal length).

Does that mean, we may want to tweak the underlying diff parameters for
this special use case eventually?

>
> -#define COLOR_DUAL_MODE 2
> -

Leave this out in the first patch?

> @@ -19,6 +23,279 @@ static int parse_creation_weight(const struct option *opt, const char *arg,
>         return 0;
>  }
>
> +struct patch_util {
> +       /* For the search for an exact match */
> +       struct hashmap_entry e;
> +       const char *diff, *patch;
> +
> +       int i;
> +       int diffsize;
> +       size_t diff_offset;
> +       /* the index of the matching item in the other branch, or -1 */
> +       int matching;
> +       struct object_id oid;
> +};
> +
> +/*
> + * Reads the patches into a string list, with the `util` field being populated
> + * as struct object_id (will need to be free()d).
> + */
> +static int read_patches(const char *range, struct string_list *list)
> +{
> +       struct child_process cp = CHILD_PROCESS_INIT;
> +       FILE *in;
> +       struct strbuf buf = STRBUF_INIT, line = STRBUF_INIT;
> +       struct patch_util *util = NULL;
> +       int in_header = 1;
> +
> +       argv_array_pushl(&cp.args, "log", "--no-color", "-p", "--no-merges",
> +                       "--reverse", "--date-order", "--decorate=no",
> +                       "--no-abbrev-commit", range,
> +                       NULL);
> +       cp.out = -1;
> +       cp.no_stdin = 1;
> +       cp.git_cmd = 1;
> +
> +       if (start_command(&cp))
> +               return error_errno(_("could not start `log`"));
> +       in = fdopen(cp.out, "r");
> +       if (!in) {
> +               error_errno(_("could not read `log` output"));
> +               finish_command(&cp);
> +               return -1;
> +       }

With the implementation of --color-moved, there is an option
to buffer all diff output in memory e6e045f8031 (diff.c: buffer
all output if asked to, 2017-06-29), so I posit that running this
diff in-core may be "not too hard". Famous last words.

In addition to that patch, we'd have to buffer commit messages
and buffer multiple commits, as that only buffers a diff of a single
commit.

The benefit would be no invocation of new processes, letting us
do more in core. This would allow for tweaking revision walking
internally, e.g. passing of options to this command such as rename
detection factors, can be passed through easily without the need
of translating it back to the command line.
Let's read on.

> +
> +               if (starts_with(line.buf, "diff --git")) {

When using the internal buffers, you would not need to
string compare, but could just check for the
DIFF_SYMBOL_HEADER.

> +               } else if (starts_with(line.buf, "@@ "))
> +                       strbuf_addstr(&buf, "@@");

So we omit line information for hunks. Makes sense,
though then we could also skip the "index ..." lines?

Stefan



[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux