2011/11/24 Günter Kukkukk <linux@xxxxxxxxxxx>: > On Wednesday 23 November 2011 19:00:16 Amit Sahrawat wrote: >> Hi Alan, >> Ok, translations cannot be added easily. But any idea why surrogate >> pairs are not handled? I think handling for surrogate pairs can be >> added by identifying proper points(there are not many I guess). Please >> share your views. >> >> Regards, >> Amit Sahrawat >> >> On Wed, Nov 23, 2011 at 10:42 PM, Alan Cox <alan@xxxxxxxxxxxxxxxxxxx> wrote: >> > On Wed, 23 Nov 2011 11:31:47 -0500 (EST) >> > >> > Alan Stern <stern@xxxxxxxxxxxxxxxxxxx> wrote: >> >> On Wed, 23 Nov 2011, NamJae Jeon wrote: >> >> > Hi. Alan. >> >> > Would you know why there is no upper/lower case table in nls utf8 ? >> >> > And Currently Surrogate pair is not supported also in nls utf8. Is >> >> > there the reason ? >> >> >> >> I don't know. >> > >> > For one case translations are locale specific and very very complicated. >> > -- >> > To unsubscribe from this list: send the line "unsubscribe linux-kernel" >> > in the body of a message to majordomo@xxxxxxxxxxxxxxx >> > More majordomo info at http://vger.kernel.org/majordomo-info.html >> > Please read the FAQ at http://www.tux.org/lkml/ > > "Surrogate pairs" had to been implemented to extend the former > 16 bit limit of UCS-2/UTF-16. > > Unicode has been limited to max 0x0010FFFF glyphs - which > would not fit in UCS-2/UTF-16. > > To extend UTF-16, the "surrogate range" between D800 and DFFF was "stolen" > from the one of the previously named "Private Use Areas" of UCS-2. > ----- > > Have those "surrogate pairs" any impact on _todays_ linux file name conventions? > > I think the easy answer is NO ! > > AFAIK - _no_ current operating system is supporting this! > > We are talking here about "allowed dir/file name characters"! How about Chinese/Japanese/Korean characters? User won't be able to create new file with CJK/HAN chars if there is no surrogate pair support. > > The main reason behind "Surrogate pairs" was to allow "userland" (!) > applications to use worldwide special character glyphs! > --------- > > Anyway - in nls_base.c > ..... > static const struct utf8_table utf8_table[] = > { > {0x80, 0x00, 0*6, 0x7F, 0, /* 1 byte sequence */}, > {0xE0, 0xC0, 1*6, 0x7FF, 0x80, /* 2 byte sequence */}, > {0xF0, 0xE0, 2*6, 0xFFFF, 0x800, /* 3 byte sequence */}, > {0xF8, 0xF0, 3*6, 0x1FFFFF, 0x10000, /* 4 byte sequence */}, > {0xFC, 0xF8, 4*6, 0x3FFFFFF, 0x200000, /* 5 byte sequence */}, > {0xFE, 0xFC, 5*6, 0x7FFFFFFF, 0x4000000, /* 6 byte sequence */}, > {0, /* end of table */} > }; > ........ > that configured range exceeds the max. allowed unicode range 0x0010FFFF > and _must_ be changed to: > > static const struct utf8_table utf8_table[] = > { > {0x80, 0x00, 0*6, 0x7F, 0, /* 1 byte sequence */}, > {0xE0, 0xC0, 1*6, 0x7FF, 0x80, /* 2 byte sequence */}, > {0xF0, 0xE0, 2*6, 0xFFFF, 0x800, /* 3 byte sequence */}, > {0xF8, 0xF0, 3*6, 0x1FFFFF, 0x10000, /* 4 byte sequence */}, > {0, /* end of table */} > }; > > Cheers, Günter > -- > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in > the body of a message to majordomo@xxxxxxxxxxxxxxx > More majordomo info at http://vger.kernel.org/majordomo-info.html > Please read the FAQ at http://www.tux.org/lkml/ > -- To unsubscribe from this list: send the line "unsubscribe linux-cifs" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html