From: Lars Schneider <larsxschneider@xxxxxxxxx> Hi, Patches 1-6,9 are preparation and helper functions. Patch 7,8,10 are the actual change. This series is based on v2.16.0 and Torsten's 8462ff43e4 (convert_to_git(): safe_crlf/checksafe becomes int conv_flags, 2018-01-13). The series can be rebased without conflicts on top of v2.17.0: https://github.com/larsxschneider/git/tree/encoding-2.17 Changes since v12: * commit message improvement (Torsten) * prevent undefined memcpy behavior in has_bom_prefix (Avar) * improve error message: true/false are no valid working-tree-encodings (Torsten) * fix crash in same_encoding() if only one argument is NULL (this bug was already present before this series, Eric) Thanks, Lars RFC: https://public-inbox.org/git/BDB9B884-6D17-4BE3-A83C-F67E2AFA2B46@xxxxxxxxx/ v1: https://public-inbox.org/git/20171211155023.1405-1-lars.schneider@xxxxxxxxxxxx/ v2: https://public-inbox.org/git/20171229152222.39680-1-lars.schneider@xxxxxxxxxxxx/ v3: https://public-inbox.org/git/20180106004808.77513-1-lars.schneider@xxxxxxxxxxxx/ v4: https://public-inbox.org/git/20180120152418.52859-1-lars.schneider@xxxxxxxxxxxx/ v5: https://public-inbox.org/git/20180129201855.9182-1-tboegi@xxxxxx/ v6: https://public-inbox.org/git/20180209132830.55385-1-lars.schneider@xxxxxxxxxxxx/ v7: https://public-inbox.org/git/20180215152711.158-1-lars.schneider@xxxxxxxxxxxx/ v8: https://public-inbox.org/git/20180224162801.98860-1-lars.schneider@xxxxxxxxxxxx/ v9: https://public-inbox.org/git/20180304201418.60958-1-lars.schneider@xxxxxxxxxxxx/ v10: https://public-inbox.org/git/20180307173026.30058-1-lars.schneider@xxxxxxxxxxxx/ v11: https://public-inbox.org/git/20180309173536.62012-1-lars.schneider@xxxxxxxxxxxx/ v12: https://public-inbox.org/git/20180315225746.18119-1-lars.schneider@xxxxxxxxxxxx/ Base Ref: Web-Diff: https://github.com/larsxschneider/git/commit/3aa98e6975 Checkout: git fetch https://github.com/larsxschneider/git encoding-v13 && git checkout 3aa98e6975 ### Interdiff (v12..v13): diff --git a/convert.c b/convert.c index 2a002af66d..1ae6301629 100644 --- a/convert.c +++ b/convert.c @@ -1222,7 +1222,7 @@ static const char *git_path_check_encoding(struct attr_check_item *check) return NULL; if (ATTR_TRUE(value) || ATTR_FALSE(value)) { - die(_("working-tree-encoding attribute requires a value")); + die(_("true/false are no valid working-tree-encodings")); } /* Don't encode to the default encoding */ diff --git a/t/t0028-working-tree-encoding.sh b/t/t0028-working-tree-encoding.sh index 884f0878b1..12b8eb963a 100755 --- a/t/t0028-working-tree-encoding.sh +++ b/t/t0028-working-tree-encoding.sh @@ -152,7 +152,7 @@ test_expect_success 'check unsupported encodings' ' echo "*.set text working-tree-encoding" >.gitattributes && printf "set" >t.set && test_must_fail git add t.set 2>err.out && - test_i18ngrep "working-tree-encoding attribute requires a value" err.out && + test_i18ngrep "true/false are no valid working-tree-encodings" err.out && echo "*.unset text -working-tree-encoding" >.gitattributes && printf "unset" >t.unset && diff --git a/utf8.c b/utf8.c index 2d8821d36e..25d366d6b3 100644 --- a/utf8.c +++ b/utf8.c @@ -428,8 +428,12 @@ int is_encoding_utf8(const char *name) int same_encoding(const char *src, const char *dst) { - if (is_encoding_utf8(src) && is_encoding_utf8(dst)) - return 1; + static const char utf8[] = "UTF-8"; + + if (!src) + src = utf8; + if (!dst) + dst = utf8; if (same_utf_encoding(src, dst)) return 1; return !strcasecmp(src, dst); @@ -559,7 +563,7 @@ char *reencode_string_len(const char *in, int insz, static int has_bom_prefix(const char *data, size_t len, const char *bom, size_t bom_len) { - return (len >= bom_len) && !memcmp(data, bom, bom_len); + return data && bom && (len >= bom_len) && !memcmp(data, bom, bom_len); } static const char utf16_be_bom[] = {0xFE, 0xFF}; ### Patches Lars Schneider (10): strbuf: remove unnecessary NUL assignment in xstrdup_tolower() strbuf: add xstrdup_toupper() strbuf: add a case insensitive starts_with() utf8: teach same_encoding() alternative UTF encoding names utf8: add function to detect prohibited UTF-16/32 BOM utf8: add function to detect a missing UTF-16/32 BOM convert: add 'working-tree-encoding' attribute convert: check for detectable errors in UTF encodings convert: add tracing for 'working-tree-encoding' attribute convert: add round trip check based on 'core.checkRoundtripEncoding' Documentation/config.txt | 6 + Documentation/gitattributes.txt | 88 +++++++++++++ config.c | 5 + convert.c | 276 ++++++++++++++++++++++++++++++++++++++- convert.h | 2 + environment.c | 1 + git-compat-util.h | 1 + sha1_file.c | 2 +- strbuf.c | 22 +++- strbuf.h | 1 + t/t0028-working-tree-encoding.sh | 245 ++++++++++++++++++++++++++++++++++ utf8.c | 65 ++++++++- utf8.h | 28 ++++ 13 files changed, 737 insertions(+), 5 deletions(-) create mode 100755 t/t0028-working-tree-encoding.sh base-commit: 8a2f0888555ce46ac87452b194dec5cb66fb1417 -- 2.16.2