------------------------------ On Tue, Apr 8, 2014 12:56 PM BST Sergei Antonov wrote: >On 8 April 2014 04:19, Hin-Tak Leung <hintak.leung@xxxxxxxxx> wrote: > >> Strictly speaking, the maximum value of NLS_MAX_CHARSET_SIZE = 6 >> is not attainable in the case of conversion to UTF-8, as that >> requires the use of surrogate pairs, i.e. consuming two storage units. > >True that 6 is not attainable, wrong that it is with surrogate pairs. >A surrogate pair encodes code-points from U+10000 to U+10FFFF, which >is 4 bytes in UTF-8 (a moderate 2 per one UTF-16 code-unit). > >Multiplier 3 is enough for all cases of UTF-16/UCS-2 to UTF-8 conversion. No. The part of the commit message you skipped, specifically mentioned that conversion to a GB18030 locale can require x4. x3 is not enough. The sentence is just a "BTW, the value of 6 can not 'usually' happen within this...". I only put in UTF-8 there, because it is widely used. It is entirely possible for a UTF16-BE -> "some-hill-billy-inbreeding-language-that-only-a-few-people-in-the-world-speak" conversion scheme to hit 6. Sigh. Remember what I said earlier about arguing about words and meaning of words not being constructive? The code does the correct thing. I could re-word the commit message, and/or delete that paragraph, or modifying "... the use of surrogate pairs *and other means of encoding the higher planes*, i.e. consuming *more than* two storage units...". and you could probably go on about what "other means of encoding" is. You can probably also go on about why it should be "two", "more than two", but not "more than or equal to two". That few words in the commit message really does not make any real difference, and it is also quite unconstructive to argue about the meaning of a few words, out of context. Hin-Tak -- To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html