Re: Bug: `gitsubmodule` does not list modules with unicode characters

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Mon, Mar 25, 2013 at 09:30:44AM +0100, Jens Lehmann wrote:
> Am 23.03.2013 17:28, schrieb Ilya Kulakov:
> > The `git submodule` commands seem to ignore modules which paths contain
> > unicode characters.
> > 
> > Consider the following steps to reproduce the problem:
> > 
> >   1. Create a directory with name that contains at least one unicode character
> >      (e.g. "ûñïçödé-rèpø")
> > 
> >   2. Initialize git repository within this directory
> > 
> >   3. Add this repository as a submodule to another repository so that
> >      unicode characters will appear in the path to the module
> >      (e.g. "../ûñïçödé-rèpø")
> > 
> >   4. Check that .gitmodules file is updated and contains record
> >      about just added module
> > 
> >   5. List submodules with using `git submodule` and find out
> >      that just added module is not listed
> 
> Thanks for your report. It is known that git submodule does not behave
> very well when path names contain special characters. I'll look into
> that when I find some time to see if we can easily fix your problem.

I've looked into this a bit.

git ls-files will return all filenames "c-style quoted". Hence the
filename åäö will be returned as "303245303244303266". This is of course
also wrong as it should be "\303\245\303\244\303\266".

However, if we tell git ls-files to use \0 instead of \n for line
termination. We get åäö returned. So how can the choose of line termination
effect the encoding?

Look in quote.c. The following patch will solve this particular problem
(but break other usecases!)

diff --git a/quote.c b/quote.c
index 911229f..2870ca5 100644
--- a/quote.c
+++ b/quote.c
@@ -284,7 +284,7 @@ void quote_two_c_style(struct strbuf *sb, const char *prefix, const char *path,
 void write_name_quoted(const char *name, FILE *fp, int terminator)
 {
 	if (terminator) {
-		quote_c_style(name, NULL, fp, 0);
+		fputs(name, fp);
 	} else {
 		fputs(name, fp);
 	}

Why don't we always print names quoted? IMHO the choose of line
termination should not do anything else than alter the line termination.

However, an other solution would be to use git ls-files -z in
git-submodule.sh and then rewrite the perl-code to handle \0 instead
of \n.

(The same perl-code I wanted to throw away 13 months ago but
Junio wanted to keep because perl can handle \0 and eventually -z should
be used according to him. He was right.)

However, a shortcut would be to the patch below. It will work as long as
there's no newline in the filename (is that really something we want to
support? If not, let's throw away perl and stick with the sed solution
below).

diff --git a/git-submodule.sh b/git-submodule.sh
index 79bfaac..31524d3 100755
--- a/git-submodule.sh
+++ b/git-submodule.sh
@@ -113,9 +113,10 @@ resolve_relative_url ()
 module_list()
 {
 	(
-		git ls-files --error-unmatch --stage -- "$@" ||
+		git ls-files --error-unmatch --stage -z -- "$@" ||
 		echo "unmatched pathspec exists"
 	) |
+	sed -e 's/\x00/\n/g' |
 	perl -e '
 	my %unmerged = ();
 	my ($null_sha1) = ("0" x 40);

-- 
Med vänliga hälsningar
Fredrik Gustafsson

tel: 0733-608274
e-post: iveqy@xxxxxxxxx
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html




[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]