On Jan 21, 2008, at 3:56 PM, Dmitry Potapov wrote:
On Mon, Jan 21, 2008 at 02:05:51PM -0500, Kevin Ballard wrote:But that is *entirely* a separate issue from "normalization". Kevin, you seem to think that normalization is somehow forced on you by the "text-as-codepoints" decision, and that is SIMPLY NOT TRUE. Normalization is a totally separate decision, and it's a STUPID one, because it breaks so many of the _nice_ properties of using UTF-8.I'm not saying it's forced on you, I'm saying when you treat filenamesas text,to treat as text could mean different for different people. Some may prefer to fi and fi_ligature to be treated as same in some context.
Those people can use NFKC/NFKD (compatibility equivalence). As I've said before, I'm talking about canonical equivalence, because that doesn't lose information like compatibility equivalence does (ex. the fi ligature gets turned into fi in compatibility equivalence, but not canonical equivalence).
it DOESN'T MATTER if the string gets normalized. As long as the string remains equivalent,As matter of fact it does, otherwise characters would be the same and we would not have this conversation at all. String can be equivalent and not equivalent at the time, because there are different equivalent relations. Finally, what HFS+ does is even not normalization. In the technote, Apple explains that they decompose some characters but not others for better compatibility. So, you see, there is a PROBLEM here.
Again, I've specified many times that I'm talking about canonical equivalence.
And yes, HFS+ does normalization, it just doesn't use NFD. It uses a custom variant. I fail to see how this is a problem.
Alright, fine. I'm not saying HFS+ is right in storing the normalized version, but I do believe the authors of HFS+ must have had a reason to do that,I don't say they do that without *any* reason, but I suppose all Apple developers in the Copland project had some reasons for they did, but the outcome was not very good...
Stupid engineers don't get to work on developing new filesystems. And Copland didn't fail because of stupid engineers anyway. If I had to blame someone, I'd blame management.
The only information you lose when doing canonical normalization is what the original byte sequence was.Not true. You lose the original sequence of *characters*.
Which is only a problem if you care about the byte sequence, which is kinda the whole point of my argument.
-Kevin Ballard -- Kevin Ballard http://kevin.sb.org kevin@xxxxxx http://www.tildesoft.com
<<attachment: smime.p7s>>