On Thu, Nov 23, 2017 at 10:36:37AM -0800, Viacheslav Dubeyko wrote: > On Thu, 2017-11-23 at 08:32 -0300, Ernesto A. Fernández wrote: > > Hi: > > > > your issue seems to be in the decomposition of hangul characters, not > > in > > the recomposition before printing. The hfsplus module on linux is > > saving > > the name of your actor as AC F5 C7 20, without performing any > > decomposition at all. > > > > The reason your patch hides the bug is because it causes linux to > > present > > filenames as decomposed utf8, so it is not necessary to decompose > > again > > before working with them. But the issue is still there, and you will > > most > > likely run into trouble if you make a hangul filename in linux and > > try > > to work with it in MacOS. > > > > Reviewing the code it would seem that the developers completely > > forgot > > the hangul characters had their own rules for decomposition. It's > > weird > > because they did the composition part correctly. > > > > I've made a quick draft of a patch, mostly by copying the code > > provided > > in the unicode web. I don't think we can actually use it on a > > > Could you please share the link for "the unicode web"? > > Thanks, > Vyacheslav Dubeyko. I'm not asking for any reviews yet, just testing because I don't have a Mac. As long as that's clear, this is the latest version of Unicode: www.unicode.org/versions/Unicode10.0.0/ You want section 3.12. > > > > release, > > but it should be enough to check if I'm right. It works fine on > > linux, > > but I don't have a mac, so it would be great if you could test it for > > me. > > > > Thanks, > > Ernest > > > > (By the way, there is no reason you should have to use the > > nodecompose > > mount option, as the other reviewer suggested. Using that option will > > have a similar effect to that of your patch. It will hide the > > problem, > > but if you create a hangul filename on linux with that option you > > probably won't be able to use it on a mac.) > > > > --- > > diff --git a/fs/hfsplus/unicode.c b/fs/hfsplus/unicode.c > > index dfa90c2..9006c61 100644 > > --- a/fs/hfsplus/unicode.c > > +++ b/fs/hfsplus/unicode.c > > @@ -272,7 +272,7 @@ static inline int asc2unichar(struct super_block > > *sb, const char *astr, int len, > > return size; > > } > > > > -/* Decomposes a single unicode character. */ > > +/* Decomposes a single non-Hangul unicode character. */ > > static inline u16 *decompose_unichar(wchar_t uc, int *size) > > { > > int off; > > @@ -296,6 +296,29 @@ static inline u16 *decompose_unichar(wchar_t uc, > > int *size) > > return hfsplus_decompose_table + (off / 4); > > } > > > > +/* Decomposes a Hangul unicode character. */ > > +int decompose_hangul(wchar_t uc, u16 *result) > > +{ > > + int index; > > + int l, v, t; > > + > > + index = uc - Hangul_SBase; > > + if (index < 0 || index >= Hangul_SCount) > > + return 0; > > + > > + l = Hangul_LBase + index / Hangul_NCount; > > + v = Hangul_VBase + (index % Hangul_NCount) / Hangul_TCount; > > + t = Hangul_TBase + index % Hangul_TCount; > > + > > + result[0] = l; > > + result[1] = v; > > + if (t != Hangul_TBase) { > > + result[2] = t; > > + return 3; > > + } > > + return 2; > > +} > > + > > int hfsplus_asc2uni(struct super_block *sb, > > struct hfsplus_unistr *ustr, int max_unistr_len, > > const char *astr, int len) > > @@ -303,15 +326,23 @@ int hfsplus_asc2uni(struct super_block *sb, > > int size, dsize, decompose; > > u16 *dstr, outlen = 0; > > wchar_t c; > > + u16 hangul_buf[3]; > > > > decompose = !test_bit(HFSPLUS_SB_NODECOMPOSE, > > &HFSPLUS_SB(sb)->flags); > > while (outlen < max_unistr_len && len > 0) { > > size = asc2unichar(sb, astr, len, &c); > > > > - if (decompose) > > - dstr = decompose_unichar(c, &dsize); > > - else > > + if (decompose) { > > + /* Hangul is handled separately */ > > + dstr = &hangul_buf[0]; > > + dsize = decompose_hangul(c, dstr); > > + if (dsize == 0) > > + /* not Hangul */ > > + dstr = decompose_unichar(c, &dsize); > > + } else { > > dstr = NULL; > > + } > > + > > if (dstr) { > > if (outlen + dsize > max_unistr_len) > > break;