"ZHANG, Le" <r0bertz@xxxxxxxxxx> writes: > When this parameter is specified, patch will be converted to a target > encoding before applied. The target encoding defaults to UTF-8. It > could also be specified by i18n.patchencoding. > > Signed-off-by: ZHANG, Le <r0bertz@xxxxxxxxxx> > --- Ah, please forget (most of) what I said in my review of 2/4. This series is not about incoming e-mails that misidentify their content encoding, but is about accepting patches from e-mails that are in different encoding and correctly says which encoding they are in. We make mailinfo convert the payload to UTF-8 or "i18n.patchencoding". The above is a clear indication that "i18n.patchencoding" is grossly misnamed. It sounds as if it is about the encoding of patches, and because we are discussing "mailinfo", I mistook that is about the encoding used in the incoming patches. But it is not. It is about encoding of blobs and paths used in the repository. With your series, the mailinfo codepath that deals with incoming patches happens to become the first user to pay attention to the repository data encoding, but that is not a good reason to name it "i18n.patchencoding". In the longer run, it is entirely plausible that we will want to know about different encodings used for these things: - encoding in which blobs and tree paths are stored in the repository. This is what you called "i18n.patchencoding". - encoding in which the log messages of the commits are stored in the repository. We have "i18n.commitencoding" variable for that. - encoding in which incoming patches are given to "mailinfo". This is read from the e-mail message. - encoding in which blobs are checked out to the working tree. In a distant future, we may want to re-encode blobs from the repository encoding to this encoding while checking out, and the other way while checking in. - encoding in which readdir(3) returns paths from the working tree. In a distant future, we may want to re-encode paths from the repository encoding to this encoding while checking out, and the other way while checking in. I think as a short-term solution to your immediate issue, your series may be good enough, but if we are to go this route, we really should make it "the repository encoding". And the variable that replaces your "patch_charset" should be declared in cache.h, defined in environment.c, parsed by git_default_config(), and be accessible to everybody. An alternative is to use i18n.commitencoding for that purpose, but that will cause issues with existing repositories---they have been happy without us recoding the payload, and they will get upset if we suddenly start doing so. So I'd say that a new "i18n.repositoryencoding" configuration variable and repository_encoding variable would be a better design in the longer term. They (1) default to "binary" (or "bytestring", "literal", "verbatim", whatever) to tell us _not_ to do any encoding conversion to the contents when accepting patches, generating patches, checking out, checking in, or running git-archive; and (2) when set, certain operations will pay attention to it; in your situation, "mailinfo" will convert incoming e-mails to its value (presumably "UTF-8"). Yet even longer term, we probably would need to make the blob encoding into an attribute to the path (e.g. think about an i18n documentation project, where different translations may need to be stored in different encodings). This will open another big can of worms, though. Unless the incoming e-mail is split into a MIME multipart that contains one-patch per file with each part in different encoding, a single plaintext message needs to be in a single encoding, so your mailinfo patch would most likely need to encode to one canonical encoding (i.e. "UTF-8"), and then reencode to per path encoding when feeding the patch. Then there is another issue of what to do with the tree paths that appear in "diff --git" and "rename from ..." headers which most likely to be a single canonical encoding in the project. But we need to start from somewhere, so let's make the first step "a single project wide blob and tree path encoding". Opinions? -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html