Re: [PATCH v7 0/7] convert: add support for different encodings

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Sat, Feb 24, 2018 at 04:18:36PM +0100, Lars Schneider wrote:

> > We always use the in-repo contents when generating 'diff'.  I think
> > by "attribute to be used in diff", what you are reallying after is
> > to convert the in-repo contents to that encoding _BEFORE_ running
> > 'diff' on it.  E.g. in-repo UTF-16 that can have NUL bytes all over
> > the place will not diff well with the xdiff machinery, but if you
> > first convert it to UTF-8 and have xdiff work on it, you can get
> > reasonable result out of it.  It is unclear what encoding you want
> > your final diff output in (it is equally valid in such a set-up to
> > desire your patch output in UTF-16 or UTF-8), but assuming that you
> > want UTF-8 in your patch output, perhaps we do not have to break
> > gitk users by hijacking the 'encoding' attribute.  Instead what you
> > want is a single bit that says between in-repo or working tree which
> > representation should be given to the xdiff machinery.
> 
> I fear that we could confuse users with an additional knob/bit that
> defines what we diff against. Git always diff'ed against in-repo 
> content and I feel it should stay that way.

Well, except for textconv. You can already do this:

  echo "foo diff=utf16" >.gitattributes
  git config diff.utf16.textconv 'iconv -f utf16 -t utf8'

We could make that easier to use and much more efficient by:

  1. Allowing a special syntax for textconv filters that kicks off an
     internal iconv.

  2. Providing baked-in config for utf16.

The patch below provides a sketch. But I think Torsten raised a good
point that you might want the encoding conversion to be independent of
other diff characteristics (so, e.g., you might say "this is utf16 but
once converted treat it like C code for finding funcnames, etc").

---
diff --git a/diff.c b/diff.c
index 21c3838b25..04032e059c 100644
--- a/diff.c
+++ b/diff.c
@@ -5968,6 +5968,21 @@ struct diff_filepair *diff_unmerge(struct diff_options *options, const char *pat
 	return pair;
 }
 
+static char *iconv_textconv(const char *encoding, struct diff_filespec *spec,
+			    size_t *outsize)
+{
+	char *ret;
+	int outsize_int; /* this really should be a size_t */
+
+	if (diff_populate_filespec(spec, 0))
+		die("unable to load content for %s", spec->path);
+	ret = reencode_string_len(spec->data, spec->size,
+				  "utf-8", /* should be log_output_encoding? */
+				  encoding, &outsize_int);
+	*outsize = outsize_int;
+	return ret;
+}
+
 static char *run_textconv(const char *pgm, struct diff_filespec *spec,
 		size_t *outsize)
 {
@@ -5978,6 +5993,9 @@ static char *run_textconv(const char *pgm, struct diff_filespec *spec,
 	struct strbuf buf = STRBUF_INIT;
 	int err = 0;
 
+	if (skip_prefix(pgm, "iconv:", &pgm))
+		return iconv_textconv(pgm, spec, outsize);
+
 	temp = prepare_temp_file(spec->path, spec);
 	*arg++ = pgm;
 	*arg++ = temp->name;
diff --git a/userdiff.c b/userdiff.c
index dbfb4e13cd..48fa7e8bdd 100644
--- a/userdiff.c
+++ b/userdiff.c
@@ -161,6 +161,7 @@ IPATTERN("css",
 	 "-?[_a-zA-Z][-_a-zA-Z0-9]*" /* identifiers */
 	 "|-?[0-9]+|\\#[0-9a-fA-F]+" /* numbers */
 ),
+{ "utf16", NULL, -1, { NULL, 0 }, NULL, "iconv:utf16" },
 { "default", NULL, -1, { NULL, 0 } },
 };
 #undef PATTERNS



[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux