Re: why do we need utf8 normalization when compare name?

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Mon, Mar 02, 2020 at 05:00:24PM +0800, lampahome wrote:
> According to case insensitive since kernel 5.2, d_compare will
> transform string into normalized form and then compare.
> 
> But why do we need this normalization function? Could we just compare
> by utf8 string?

Have you read https://en.wikipedia.org/wiki/Unicode_equivalence ?

We need to decide whether a user with a case-insensitive filesystem
who looks up a file with the name U+00E5 (lower case "a" with ring)
should find a file which is named U+00C5 (upper case "A" with ring)
or U+212B (Angstrom sign).

Then there's the question of whether e-acute is stored as U+00E9
or U+0065 followed by U+0301, and both of those will need to be found
by a user search for U+00C9 or a user searching for U+0045 U+0301.

So yes, normalisation needs to be done.



[Index of Archives]     [Linux Ext4 Filesystem]     [Union Filesystem]     [Filesystem Testing]     [Ceph Users]     [Ecryptfs]     [AutoFS]     [Kernel Newbies]     [Share Photos]     [Security]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux Cachefs]     [Reiser Filesystem]     [Linux RAID]     [Samba]     [Device Mapper]     [CEPH Development]

  Powered by Linux