Re: [RFC/PATCH] fix "git diff" to create wrong UTF-8 text

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Tsugikazu Shibata <tshibata@xxxxxxxxxxxxx> writes:

> I believe this should be work for another language using UTF-8 and
> solve this issue.
> ...
> @@ -368,6 +394,7 @@ int xdl_emit_hunk_hdr(long s1, long c1,
>  		buf[nb++] = ' ';
>  		if (funclen > sizeof(buf) - nb - 1)
>  			funclen = sizeof(buf) - nb - 1;
> +		funclen = utf8width(func, funclen);
>  		memcpy(buf + nb, func, funclen);
>  		nb += funclen;
>  	}

I'd rather not do this in xdiff/ level for two reasons.

We consider the functions there strictly "borrowed code", and
ideally I'd rather even not to have that "chop down to funclen"
logic at that level.

The code at that level does not know what paths it is dealing
with and cannot consult git specific data (i.e. attributes); it
would make it harder to enhance it later by introducing per-path
encoding information.

How about...

 * (optional) lift the funclen limit from xdl_emit_hunk_hdr()
   and xdl_emit_diff() and have them preserve the full line;

 * Around ll.554 in diff.c::fn_out_consume(), look at
   line[i..eol] and apply the "chomp between character" logic
   there.  I think it is very sensible to use the UTF-8 logic by
   default as you did above (but I suspect you may be able to
   reuse helper functions in utf8.c such as git_wcwidth(),
   instead of rolling your own).  Chomp the funcname line given
   from xdiff layer in this function.

 * (future) make the length of the funcname line configurable
   either from the command line or configuration.

 * (future) add per-path blob encoding information (default to
   UTF-8) to struct emit_callback, and initialize it from the
   gitattributes(5) mechanism, just like we have added ws_rule
   recently there.  Use that to decide how inter-character chomp
   works to customize the logic in diff.c::fn_out_consume() you
   would introduce above.

I think the two future enhancements I listed above as examples
would be cleaner to implement if the line chomping is done in
fn_out_consume().
-
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux