> Unicode normalisation will take the strings "ñ" (U+00F1) and "n◌̃" > (U+006E U+0303) and turn them into the same Unicode string. Note that > there are four kinds of Unicode normalisation (NFD, NFC, NFKD, NFKC), so > what precise string you end up with depends on which form you're using. > Linux uses NFD, I believe. > And yes, once the strings are normalised and encoded as UTF-8 you then > do a byte-by-byte comparison (if the comparison is case-insensitive then > fs/unicode/... will case-fold the Unicode symbols during normalisation). > What I'm confused is why encoded as utf-8 after normalize finished? >From above, turn "ñ" (U+00F1) and "n◌̃" (U+006E U+0303) into the same Unicode string. Then why should we just compare bytes from normalized.