Re: why do we need utf8 normalization when compare name?

Matthew Wilcox <willy@xxxxxxxxxxxxx> · Mon, 2 Mar 2020 04:54:32 -0800

On Mon, Mar 02, 2020 at 05:00:24PM +0800, lampahome wrote:
> According to case insensitive since kernel 5.2, d_compare will
> transform string into normalized form and then compare.
> 
> But why do we need this normalization function? Could we just compare
> by utf8 string?

Have you read https://en.wikipedia.org/wiki/Unicode_equivalence ?

We need to decide whether a user with a case-insensitive filesystem
who looks up a file with the name U+00E5 (lower case "a" with ring)
should find a file which is named U+00C5 (upper case "A" with ring)
or U+212B (Angstrom sign).

Then there's the question of whether e-acute is stored as U+00E9
or U+0065 followed by U+0301, and both of those will need to be found
by a user search for U+00C9 or a user searching for U+0045 U+0301.

So yes, normalisation needs to be done.