Re: [PATCH v7] git on Mac OS and precomposed unicode

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Torsten Bögershausen <tboegi@xxxxxx> writes:

> +core.precomposedunicode::
> +	...
> +	When false, file names are handled fully transparent by git, which means
> +	that file names are stored as decomposed unicode in the repository.

I do not think it means any such thing.

We just take whatever the platform throws at us and shove that in
the repository.  On MacOS X with HFS+, it may be decomposed UTF-8,
but we do not even try to ensure everything (like the path added by
somebody else on a BSD system in a commit that you fetched) is in a
particular encoding.

> diff --git a/Makefile b/Makefile
> index f62ca2a..55ceb10 100644
> --- a/Makefile
> +++ b/Makefile
> @@ -607,6 +607,7 @@ LIB_H += compat/bswap.h
>  LIB_H += compat/cygwin.h
>  LIB_H += compat/mingw.h
>  LIB_H += compat/obstack.h
> +LIB_H += compat/precomposed_utf8.h

Micronit.  Shouldn't these all be called "precompose_utf8"
throughout the patch?

We are asking Git "please normalize by precompose any UTF-8
pathnames" when we give the -DPRECOMPOSE_UNICODE C-preprocessor
macro, and compat/precompose_utf8.[ch] are to implement the
machinery to do so.

> diff --git a/compat/precomposed_utf8.c b/compat/precomposed_utf8.c
> new file mode 100644
> index 0000000..14bb0ce
> --- /dev/null
> +++ b/compat/precomposed_utf8.c
> @@ -0,0 +1,189 @@
> +/* Converts filenames from decomposed unicode into precomposed unicode.
> +   Used on MacOS X.
> +*/

Micronit.

	/*
         * Multi-line comments begin by slash asterisk newline.
         * and ends with a run of SP to align asterisk, asterisk
         * and then newline, like this.
         */
> +#define __PRECOMPOSED_UNICODE_C__
> +
> +#include "cache.h"
> +#include "utf8.h"
> +#include "precomposed_utf8.h"


> +#include "stdio.h"

You shouldn't need "stdio.h" as you are including "git-compat-util.h"
via "cache.h".

> diff --git a/compat/precomposed_utf8.h b/compat/precomposed_utf8.h
> new file mode 100644
> index 0000000..708a1c6
> --- /dev/null
> +++ b/compat/precomposed_utf8.h
> ...
> +#ifndef __PRECOMPOSED_UNICODE_C__
> +#define dirent dirent_prec_psx
> +#define opendir(n) precomposed_utf8_opendir(n)
> +#define readdir(d) precomposed_utf8_readdir(d)
> +#define closedir(d) precomposed_utf8_closedir(d)
> +#define DIR PREC_DIR
> +#endif /* __PRECOMPOSED_UNICODE_C__ */

Hrm, this is not wrong per-se, but looks somewhat unwieldy.

> +#define  __PRECOMPOSED_UNICODE_H__
> +#endif /* __PRECOMPOSED_UNICODE_H__ */

> diff --git a/utf8.c b/utf8.c
> index 8acbc66..a544f15 100644
> --- a/utf8.c
> +++ b/utf8.c
> @@ -433,19 +433,12 @@ int is_encoding_utf8(const char *name)
> ...
> @@ -478,6 +470,20 @@ char *reencode_string(const char *in, const char *out_encoding, const char *in_e
>  			break;
>  		}
>  	}
> +	return out;
> +}
> +
> +char *reencode_string(const char *in, const char *out_encoding, const char *in_encoding)
> +{
> +	iconv_t conv;
> +	char *out;
> +
> +	if (!in_encoding)
> +		return NULL;
> +	conv = iconv_open(out_encoding, in_encoding);
> +	if (conv == (iconv_t) -1)
> +		return NULL;
> +	out = reencode_string_iconv(in, strlen(in), conv);
>  	iconv_close(conv);
>  	return out;
>  }

Much nicer ;-).
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]