Re: encoding bug in git.el

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi:

On Wed, May 21, 2008 at 7:31 AM, David Kågedal <davidk@xxxxxxxxxxxxxx> wrote:
> Karl Hasselström <kha@xxxxxxxxxxx> writes:
>
>> Recently, some commits started misrecording the "ö" in my name. (In
>> emacs, for example, it looks like this in a utf8 buffer:
>> Hasselstr\201\366m.) I'm guessing there's an extra latin1->utf8
>> conversion in there somewhere.
>
> The \201 looks more like Emacs' internal mule encoding, where
> everything that isn't ASCII is prefixed with \201 or something
> similar.

Thanks for reporting this.

I concur. This is not UTF-8 translation, but an emacs MULE encoding. I
suspect the U+F6 character is read in to the *git-commit* buffer in
latin-1 mode because git.el displays the Author line, then Emacs
writes that out as 0x81F6, because that is the emacs buffer code of
U+F6.

This is because git.el, upon git-commit-tree, always redefines the
environment variables like GIT_AUTHOR_NAME. However the difference is
that prior to commit dbe482, "env" handle the encoding while commit
dbe482 lets emacs process-environment handle it. Unfortunately the
string is passed without the proper recoding in the latter case.

Here is a proposed fix. I suggest that process-environment should be
given these envvars already encoded as shown in this code sample:

------------------ git.el ------------------
[not a proper git-diff]
@@ -216,6 +216,11 @@ and `git-diff-setup-hook'."
   "Build a list of NAME=VALUE strings from a list of environment strings."
   (mapcar (lambda (entry) (concat (car entry) "=" (cdr entry))) env))

+(defun git-get-env-strings-encoded (env encoding)
+  "Build a list of NAME=VALUE strings from a list of environment strings,
+converting from mule-encoding to ENCODING (e.g. mule-utf-8, latin-1, etc)."
+  (mapcar (lambda (entry) (concat (car entry) "="
(encode-coding-string (cdr entry) encoding))) env))
+
 (defun git-call-process-env (buffer env &rest args)
   "Wrapper for call-process that sets environment strings."
   (let ((process-environment (append (git-get-env-strings env)
@@ -265,7 +270,7 @@ and returns the process output as a string, or nil
if the git failed."

 (defun git-run-command-region (buffer start end env &rest args)
   "Run a git command with specified buffer region as input."
-  (unless (eq 0 (let ((process-environment (append (git-get-env-strings env)
+  (unless (eq 0 (let ((process-environment (append
(git-get-env-strings-encoded env coding-system-for-write)
                                                    process-environment)))
                   (git-run-process-region
                    buffer start end "git" args)))

The buffer text is saved with the encoding coding-system-for-write,
while the GIT_* envvars were not encoded, so when appending to
process-environment variable, use the same encoding.

(Reminder: the *git-commit* buffer's encoding is based on the git
config i18n.commitencoding, which in turn sets
buffer-file-coding-system, which in turn sets coding-system-for-write)

I tested this with U+F6 in the GIT_AUTHOR_NAME, git config user.name,
and the commit text, and it seems to work better (I think it's fixed).
Please review it. Also, I am not sure if this fix needs to be
propagated to the other areas where process-environment is redefined,
so YMMV.

(Lastly, while testing this for Japanese, I'm having some encoding
problem with meadow (Emacs on Windows), msysgit (git on Windows),
set-language-mode Japanese, utf-8, and M-x git-commit-file but I don't
think its related to this exact problem. Hopefully.)

>> It turns out that the breakage occurs when I commit with the
>> git-status mode from git.el, and it was introduced by this commit:
>>
>>   commit dbe48256b41c1e94d81f2458d7e84b1fdcb47026
>>   Author: Clifford Caoile <piyo@xxxxxxxxxxxxxxxxxxxxx>
>>
>>       git.el: Set process-environment instead of invoking env

:-)

This must be the reason why process-environment wasn't used in all places.

>> It's in master, but not yet in maint. (In fact, it's the _only_ change
>> to contrib/emacs that's in master but not in maint.)

Please forgive my ignorance, but what does this mean?

Best regards,
Clifford Caoile
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux