> On 07 Jan 2018, at 10:38, Torsten Bögershausen <tboegi@xxxxxx> wrote: > > On Sat, Jan 06, 2018 at 01:48:01AM +0100, lars.schneider@xxxxxxxxxxxx wrote: >> From: Lars Schneider <larsxschneider@xxxxxxxxx> >> >> Hi, >> >> Patches 1-5 and 6 are helper functions and preparation. >> Patch 6 is the actual change. >> >> I am still torn between "checkout-encoding" and "working-tree-encoding" >> as attribute name. I am happy to hear arguments for/against one or the >> other. > > checkout-encoding is probably misleading, as it is even the checkin-encoding. Yeah, I start to think the same. > What is wrong with working-tree-encoding ? > I think the 2 "-". > > What was wrong with workingtree-encoding ? Yeah, the two dashes are a minor annoyance. However, consider this: $ git grep 'working tree' -- '*.txt' | wc -l 570 $ git grep 'working-tree' -- '*.txt' | wc -l 6 $ git grep 'workingtree' -- '*.txt' | wc -l 0 $ git grep 'working tree' -- po | wc -l 704 $ git grep 'working-tree' -- po | wc -l 0 $ git grep 'workingtree' -- po | wc -l 0 I think "working tree" is a pretty established term that endusers might be able to understand. Therefore, I would like to go with "working-tree-encoding" as it was written that way at least 6 times in the Git tree before. Would that work for you? > Or > workdir-encoding ? Although I like the shortness, the term "workdir" might already be occupied [1]. Could that cause confusion? [1] 4f01748d51 (contrib/workdir: add a simple script to create a working directory, 2007-03-27) >> >> * Removed unnecessary NUL assignment in xstrdup_tolower() (Torsten) >> >> * Set "git config core.eol lf" to made the test run on Windows (Dscho) >> >> * Made BOM arrays static (Ramsay) > > > Some comments: > > I would like to have the CRLF conversion a little bit more strict - > many users tend to set core.autocrlf=true or write "* text=auto" > in the .gitattributes. > Reading all the effort about BOM markers and UTF-16LE, I think there > should ne some effort to make the line endings round trip. > Therefore I changed convert.c to demand that the "text" attribute > is set to enable CRLF conversions. > (If I had submitted the patch, I would have demanded > "text eol=lf" or "text eol=crlf", but the test case t0028 indicates > that there is a demand to produce line endings as configured in core.eol) But wouldn't that be inconvenient for the users? E.g. if I add a UTF-16 file on Windows with CRLF then it would be nice if Git would automatically convert the line endings to LF on Linux, no? IOW: Why should we handle text files that have a defined checkout-encoding differently compared to UTF-8 encoded text files? Wouldn't that be unexpected to the user? Thanks, Lars > > Anyway, I rebased it onto git.git/master, changed the docu, and pushed it to > https://github.com/tboegi/git/tree/180107-0935-For-lars-schneider-encode-V3B > > Here is a inter-diff against your version: > > diff --git a/Documentation/gitattributes.txt b/Documentation/gitattributes.txt > index 1bc03e69c..b8d9f91c8 100644 > --- a/Documentation/gitattributes.txt > +++ b/Documentation/gitattributes.txt > @@ -281,7 +281,7 @@ interpreted as binary and consequently built-in Git text processing > tools (e.g. 'git diff') as well as most Git web front ends do not > visualize the content. > > -In these cases you can teach Git the encoding of a file in the working > +In these cases you can tell Git the encoding of a file in the working Oops. I meant to change that already. Thanks! > directory with the `checkout-encoding` attribute. If a file with this > attributes is added to Git, then Git reencodes the content from the > specified encoding to UTF-8 and stores the result in its internal data > @@ -308,17 +308,20 @@ Use the `checkout-encoding` attribute only if you cannot store a file in > UTF-8 encoding and if you want Git to be able to process the content as > text. > > +Note that when `checkout-encoding` is defined, by default the line > +endings are not converted. `text=auto` and core.autocrlf are ignored. > +Set the `text` attribute to enable CRLF conversions. > + > Use the following attributes if your '*.txt' files are UTF-16 encoded > -with byte order mark (BOM) and you want Git to perform automatic line > -ending conversion based on your platform. > +with byte order mark (BOM). > > ------------------------ > -*.txt text checkout-encoding=UTF-16 > +*.txt checkout-encoding=UTF-16 > ------------------------ > > Use the following attributes if your '*.txt' files are UTF-16 little > -endian encoded without BOM and you want Git to use Windows line endings > -in the working directory. > +endian encoded without BOM and you want Git to use LF in the repo and > +CRLF in the working directory. > > ------------------------ > *.txt checkout-encoding=UTF-16LE text eol=CRLF > diff --git a/convert.c b/convert.c > index 13f766d2a..1e29f515e 100644 > --- a/convert.c > +++ b/convert.c > @@ -221,18 +221,27 @@ static void check_global_conv_flags_eol(const char *path, enum crlf_action crlf_ > } > } > > > static int will_convert_lf_to_crlf(size_t len, struct text_stat *stats, > @@ -432,7 +441,7 @@ static int crlf_to_git(const struct index_state *istate, > * cherry-pick. > */ > if ((!(conv_flags & CONV_EOL_RENORMALIZE)) && > - has_cr_in_index(istate, path)) > + has_crlf_in_index(istate, path)) > convert_crlf_into_lf = 0; > } > if (((conv_flags & CONV_EOL_RNDTRP_WARN) || > @@ -1214,9 +1223,28 @@ static void convert_attrs(struct conv_attrs *ca, const char *path) > ca->crlf_action = git_path_check_crlf(ccheck + 0); > ca->ident = git_path_check_ident(ccheck + 1); > ca->drv = git_path_check_convert(ccheck + 2); > + ca->checkout_encoding = git_path_check_encoding(ccheck + 5); > if (ca->crlf_action != CRLF_BINARY) { > enum eol eol_attr = git_path_check_eol(ccheck + 3); > - if (ca->crlf_action == CRLF_AUTO && eol_attr == EOL_LF) > + if (ca->checkout_encoding) { > + enum crlf_action crlf_action = CRLF_BINARY; > + /* > + * encoded files don't use auto. > + * 'text' must be specified to > + * do crlf conversions > + */ > + if (ca->crlf_action == CRLF_TEXT) { > + if (eol_attr == EOL_LF) > + crlf_action = CRLF_TEXT_INPUT; > + else if (eol_attr == EOL_CRLF) > + crlf_action = CRLF_TEXT_CRLF; > + else if (text_eol_is_crlf()) > + crlf_action = CRLF_TEXT_CRLF; > + else > + crlf_action = CRLF_TEXT_INPUT; > + } > + ca->crlf_action = crlf_action; > + } else if (ca->crlf_action == CRLF_AUTO && eol_attr == EOL_LF) > ca->crlf_action = CRLF_AUTO_INPUT; > else if (ca->crlf_action == CRLF_AUTO && eol_attr == EOL_CRLF) > ca->crlf_action = CRLF_AUTO_CRLF; > @@ -1225,11 +1253,11 @@ static void convert_attrs(struct conv_attrs *ca, const char *path) > else if (eol_attr == EOL_CRLF) > ca->crlf_action = CRLF_TEXT_CRLF; > } > - ca->checkout_encoding = git_path_check_encoding(ccheck + 5); > } else { > ca->drv = NULL; > ca->crlf_action = CRLF_UNDEFINED; > ca->ident = 0; > + ca->checkout_encoding = NULL; > } > > /* Save attr and make a decision for action */ > >