Linus Torvalds <torvalds@xxxxxxxxxxxxxxxxxxxx> writes: > On Wed, 16 Jan 2008, Kevin Ballard wrote: >> >> There's a difference between "looks similar" as in "Polish" vs "polish", and >> actually is the same string as in "Ma<UMLAUT MODIFIER>rchen" vs "M<A WITH >> UMLAUT>rchen". Capitalization has a valid semantic meaning, normalization >> doesn't. > > That simply isn't true. > > Normalization actually has real semantic meaning. If it didn't, there > would never ever be a reason why you'd use the non-normalized form in > the first place. Actually, there is no good reason for non-normalized forms (deficient software not able to deal with some of the normalized forms is not a good reason: such software should be fixed). It is just that the file system is a rather quirky place for enforcing the normalization. One should not be able to get unnormalized forms created easily in the first place, be it command line or script. > And there *are* cases where there are distinctions. Especially inside > computers. For one thing, you may not be talking about "characters on > screen", but you may be talking about "key sequences". And suddenly > "a<UMLAUT MODIFIER>" is a two-key sequence, and "<a WITH UMLAUT>" is a > single-key sequence, and THEY ARE DIFFERENT. > > See? No. Input methods are not the same as their resulting string. I can even produce some ASCII characters on my keyboard in more than one way and would not expect them to lead to different codes. >> How do you figure? When I type "Märchen", I'm typing a string, not a >> byte sequence. I have no control over the normalization of the >> characters. Therefore, depending on what program I'm typing the name >> in, I might use the same normalization as the filename, or I might >> miss. It's completely out of my control. This is why the filesystem >> has to step in and say "You composed that character differently, but >> I know you were trying to specify this file". > > Pure and utter garbage. > > What you are describing is an *input method* issue, not a filesystem > issue. > > The fact that you think this has anything what-so-ever to do with > filesystems, I cannot understand. How nice. We are actually in agreement here. > See? Putting the conversion in the filesystem IS INSANE. You wouldn't > make the filesystem convert the characters in the data stream (because > it would cause strange data conversion issues) AND FOR EXACTLY THE > SAME REASON it shouldn't do it for filenames either! Yup. But that does not mean that normalization is a bad idea. It is just that the filesystem is not the right place for it. -- David Kastrup, Kriemhildstr. 15, 44793 Bochum - To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html