Re: On pathnames

Johannes Schindelin <Johannes.Schindelin@xxxxxx> · Fri, 25 Jan 2008 00:36:29 +0000 (GMT)

Hi,

On Thu, 24 Jan 2008, Junio C Hamano wrote:

> [A nice, concise, well written and obviously thought-through summary of 
>  the case sensitivity and UTF-8 file name issues.]

Thank you Junio.  It must have taken much more time than just sitting 
down and hacking into the keyboard.  By this thinking before writing, you 
invested some time that you save all the readers, including me.  I 
appreciate that very much.

> [Goes on to describe what we do with symlinks when the filesystem is not 
>  capable of representing symlinks; compares that situation to the 
>  filenames situation.]

There is a fundamental difference between the symlinks situation and the 
filename situation that you should keep in mind:  even if the filesystem 
cannot create symlinks, the nature of filenames as unique keys is not 
changed.  You cannot have a symlink and a file of the same name.  In a 
way, it takes away a degree of freedom of the _values_ that the _keys_ 
point to.

The same is not true for the case-challenged filesystems; they change the 
nature from unique keys to semi-unique keys.  So while other filesystems 
can discern all different keys, these challenged filesystems cannot; they 
take away a degree of freedom of the _keys_.

It is much easier to cope with the lack of degree of freedom in values; 
you have to store the metadata somewhere else -- in this case the index -- 
but it is still easily accessible by the key.

But that is not possible if two different _keys_ are not accepted as 
different by the filesystem.  You can still store the different metadata 
in the index, but the _content_ cannot be in the filesystem under the 
desired keys; not at the same time, anyway.

> Perhaps we could have something like:
> 
> 	$ git show :xt_CONNMARK.c >xt_connmark-1.c
>         $ edit xt_connmark-1.c
> 	$ git add --as xt_CONNMARK.c xt_connmark-1.c

Something similar is already possible:

	$ git checkout xt_CONNMARK.c
	$ edit xt_CONNMARK.c
	$ git add xt_CONNMARK.c

but you have to keep in mind that

	- "git add -u" or "git commit -a" is a no-no-no, and
	- the system will not build, no matter what you change in git

on those filesystems.

Having said that, I think that a config variable/commit hooks for those 
repositories which _happen_ to live on sane filesystems, but have to be 
checked out on challenged ones, makes absolute sense.  (The commit hook is 
possible already, but less efficient than the config variable.)

> If it is a new file, we won't find any name that is equivalent to $A in 
> the index, and we use the name $A obtained from readdir(3).
> 
> BUT with a twist.
> 
> If the filesystem is known to be inconveniently case folding, we are 
> better off registering $B instead of $A (assuming we can convert from $A 
> to $B).

I tend to agree with Nico.  We should not "learn" from the challenged 
filesystems.

> Tasks
> -----
> 
>  - Identify which case folding filesystems need to be supported,
>    and make sure somebody understands its folding logic;
> 
>  - For each supported case folding logic, these are needed:
> 
>    - a hash function that throws "equivalent" names in the same
>      bucket, to be used in Linus's patch;

AFAIR Linus wanted to have one has function to rule them all.  That would 
be way cool, since it means fewer possibilities for bugs to go undetected.

>    - a compare function to determine equivalent names;

AFAICT we need three functions: strcasecmp(), utf8_strcmp() and 
utf8_strcasecmp().  Although I might be wrong, and the second is not 
needed.

Probably the answer for this has been buried in many, many lines that I 
decided not to read.  Maybe I'll ask Randal on IRC, he's usually very 
quick to give me reasonable and concise answers.  And then we trash-talk a 
little, just for fun.

Ciao,
Dscho

-
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html