On Thu, 2017-11-23 at 08:32 -0300, Ernesto A. Fernández wrote: > Hi: > > your issue seems to be in the decomposition of hangul characters, not > in > the recomposition before printing. The hfsplus module on linux is > saving > the name of your actor as AC F5 C7 20, without performing any > decomposition at all. > > The reason your patch hides the bug is because it causes linux to > present > filenames as decomposed utf8, so it is not necessary to decompose > again > before working with them. But the issue is still there, and you will > most > likely run into trouble if you make a hangul filename in linux and > try > to work with it in MacOS. > > Reviewing the code it would seem that the developers completely > forgot > the hangul characters had their own rules for decomposition. It's > weird > because they did the composition part correctly. > > I've made a quick draft of a patch, mostly by copying the code > provided > in the unicode web. I don't think we can actually use it on a Could you please share the link for "the unicode web"? Thanks, Vyacheslav Dubeyko. > release, > but it should be enough to check if I'm right. It works fine on > linux, > but I don't have a mac, so it would be great if you could test it for > me. > > Thanks, > Ernest > > (By the way, there is no reason you should have to use the > nodecompose > mount option, as the other reviewer suggested. Using that option will > have a similar effect to that of your patch. It will hide the > problem, > but if you create a hangul filename on linux with that option you > probably won't be able to use it on a mac.) > > --- > diff --git a/fs/hfsplus/unicode.c b/fs/hfsplus/unicode.c > index dfa90c2..9006c61 100644 > --- a/fs/hfsplus/unicode.c > +++ b/fs/hfsplus/unicode.c > @@ -272,7 +272,7 @@ static inline int asc2unichar(struct super_block > *sb, const char *astr, int len, > return size; > } > > -/* Decomposes a single unicode character. */ > +/* Decomposes a single non-Hangul unicode character. */ > static inline u16 *decompose_unichar(wchar_t uc, int *size) > { > int off; > @@ -296,6 +296,29 @@ static inline u16 *decompose_unichar(wchar_t uc, > int *size) > return hfsplus_decompose_table + (off / 4); > } > > +/* Decomposes a Hangul unicode character. */ > +int decompose_hangul(wchar_t uc, u16 *result) > +{ > + int index; > + int l, v, t; > + > + index = uc - Hangul_SBase; > + if (index < 0 || index >= Hangul_SCount) > + return 0; > + > + l = Hangul_LBase + index / Hangul_NCount; > + v = Hangul_VBase + (index % Hangul_NCount) / Hangul_TCount; > + t = Hangul_TBase + index % Hangul_TCount; > + > + result[0] = l; > + result[1] = v; > + if (t != Hangul_TBase) { > + result[2] = t; > + return 3; > + } > + return 2; > +} > + > int hfsplus_asc2uni(struct super_block *sb, > struct hfsplus_unistr *ustr, int max_unistr_len, > const char *astr, int len) > @@ -303,15 +326,23 @@ int hfsplus_asc2uni(struct super_block *sb, > int size, dsize, decompose; > u16 *dstr, outlen = 0; > wchar_t c; > + u16 hangul_buf[3]; > > decompose = !test_bit(HFSPLUS_SB_NODECOMPOSE, > &HFSPLUS_SB(sb)->flags); > while (outlen < max_unistr_len && len > 0) { > size = asc2unichar(sb, astr, len, &c); > > - if (decompose) > - dstr = decompose_unichar(c, &dsize); > - else > + if (decompose) { > + /* Hangul is handled separately */ > + dstr = &hangul_buf[0]; > + dsize = decompose_hangul(c, dstr); > + if (dsize == 0) > + /* not Hangul */ > + dstr = decompose_unichar(c, &dsize); > + } else { > dstr = NULL; > + } > + > if (dstr) { > if (outlen + dsize > max_unistr_len) > break;