[PATCH/RFC v5 7/7] Careful with CRLF when using e.g. UTF-16 for working-tree-encoding

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



From: Torsten Bögershausen <tboegi@xxxxxx>

UTF-16 encoded files are treated as "binary" by Git, and no CRLF
conversion is done.
When the UTF-16 encoded files are converted into UF-8 using the new
"working-tree-encoding", the CRLF are converted if core.autocrlf is true.

This may lead to confusion:
A tool writes an UTF-16 encoded file with CRLF.
The file is commited with core.autocrlf=true, the CLRF are converted into LF.
The repo is pushed somewhere and cloned by a different user, who has
decided to use core.autocrlf=false.
He uses the same tool, and now the CRLF are not there as expected, but LF,
make the file useless for the tool.

Avoid this (possible) confusion by ignoring core.autocrlf for all files
which have "working-tree-encoding" defined.

The user can still use a .gitattributes file and specify the line endings
like "text=auto", "text", or "text eol=crlf" and let that .gitattribute
file travel together with push and clone.

Change convert.c to e more careful, simplify the initialization when
attributes are retrived (and none are specified) and update the documentation.

Signed-off-by: Torsten Bögershausen <tboegi@xxxxxx>
---
 Documentation/gitattributes.txt |  9 ++++++---
 convert.c                       | 15 ++++++++++++---
 2 files changed, 18 insertions(+), 6 deletions(-)

diff --git a/Documentation/gitattributes.txt b/Documentation/gitattributes.txt
index a8dbf4be3..3665c4677 100644
--- a/Documentation/gitattributes.txt
+++ b/Documentation/gitattributes.txt
@@ -308,12 +308,15 @@ Use the `working-tree-encoding` attribute only if you cannot store a file in
 UTF-8 encoding and if you want Git to be able to process the content as
 text.
 
+Note that when `working-tree-encoding` is defined, core.autocrlf is ignored.
+Set the `text` attribute (or `text=auto`) to enable CRLF conversions.
+
 Use the following attributes if your '*.txt' files are UTF-16 encoded
-with byte order mark (BOM) and you want Git to perform automatic line
-ending conversion based on your platform.
+with byte order mark (BOM) and you want Git to perform line
+ending conversion based on core.eol.
 
 ------------------------
-*.txt		text working-tree-encoding=UTF-16
+*.txt		working-tree-encoding=UTF-16 text
 ------------------------
 
 Use the following attributes if your '*.txt' files are UTF-16 little
diff --git a/convert.c b/convert.c
index 13fad490c..e7f11d1db 100644
--- a/convert.c
+++ b/convert.c
@@ -1264,15 +1264,24 @@ static void convert_attrs(struct conv_attrs *ca, const char *path)
 		}
 		ca->checkout_encoding = git_path_check_encoding(ccheck + 5);
 	} else {
-		ca->drv = NULL;
-		ca->crlf_action = CRLF_UNDEFINED;
-		ca->ident = 0;
+		memset(ca, 0, sizeof(*ca));
 	}
 
 	/* Save attr and make a decision for action */
 	ca->attr_action = ca->crlf_action;
 	if (ca->crlf_action == CRLF_TEXT)
 		ca->crlf_action = text_eol_is_crlf() ? CRLF_TEXT_CRLF : CRLF_TEXT_INPUT;
+	/*
+	 * Often UTF-16 encoded files are read and written by programs which
+	 * really need CRLF, and it is important to keep the CRLF "as is" when
+	 * files are committed with core.autocrlf=true and the repo is pushed.
+	 * The CRLF would be converted into LF when the repo is cloned to
+	 * a machine with core.autocrlf=false.
+	 * Obey the "text" and "eol" attributes and be independent on the
+	 * local core.autocrlf for all "encoded" files.
+	 */
+	if ((ca->crlf_action == CRLF_UNDEFINED) && ca->checkout_encoding)
+		ca->crlf_action = CRLF_BINARY;
 	if (ca->crlf_action == CRLF_UNDEFINED && auto_crlf == AUTO_CRLF_FALSE)
 		ca->crlf_action = CRLF_BINARY;
 	if (ca->crlf_action == CRLF_UNDEFINED && auto_crlf == AUTO_CRLF_TRUE)
-- 
2.16.0.rc0.2.g64d3e4d0cc.dirty




[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux