On Tue, 2017-11-28 at 12:02 -0300, Ernesto A. Fernández wrote: > On Mon, Nov 27, 2017 at 02:40:53PM -0800, Viacheslav Dubeyko wrote: > > > > On Mon, 2017-11-27 at 16:36 -0300, Ernesto A. Fernández wrote: > > > > > > It would be good to have some feedback on the use of code under > > > the > > > Unicode License, I don't know if that can be a problem. > > > > > > To Ting-Chang Hou: This patch is slightly different from the one > > > you > > > tested with your mac. You are still tagged as Tested-by, but you > > > may > > > want to check again that I have not broken anything. > > > > > > Thanks, > > > Ernest > > > > > > -- >8 -- > > > Subject: [PATCH] hfsplus: fix decomposition of Hangul characters > > > > > > Files created under macOS become essentially unusable under linux > > > if > > > their names contain any Hangul, and vice versa. > > > > > > This happens because the normalization of Hangul characters is a > > > special > > > case: it can be done algorithmically, without need of a table. > > > The > > > hfsplus module deals with Hangul correctly when composing, but > > > completely forgets about it when performing a decomposition. > > > > > > Solve this by using the Hangul decomposition function provided in > > > the > > > Unicode Standard. It's under the Unicode License, compatible with > > > the > > > GPL. > > > > > I believe it makes sense to share more details about algorithm > > details. > > I think that the Unicode standard contains some explanation how to > > decompose correctly. It will be great to have some brief citation > > in the > > patch description. > The algorithm is very simple, the best way to understand it is just > looking at the code. I don't know the first thing about Korean > writing, so > I don't think I should attempt to explain why the decomposition is > done > this way. If somebody else is interested in the details, they can > follow > the citation in the header comment of the decompose_hangul function. > You should share the link where it can be found. Finally, the comment needs in more detailed explanation. > > > > > > > > This patch will cause trouble for Hangul filenames already > > > created by > > > the module in the past. This shouldn't be a special concern > > > because the > > > main purpose of the module was always sharing with macOS. If a > > > user > > > actually needs to access such a file the nodecompose mount option > > > should > > > be enough. > > > > > > Reported-by: Ting-Chang Hou <tchou@xxxxxxxxxxxx> > > > Tested-by: Ting-Chang Hou <tchou@xxxxxxxxxxxx> > > > Signed-off-by: Ernesto A. Fernández <ernesto.mnd.fernandez@gmail. > > > com> > > > --- > > > fs/hfsplus/unicode.c | 62 > > > +++++++++++++++++++++++++++++++++++++++++++++++----- > > > 1 file changed, 56 insertions(+), 6 deletions(-) > > > > > > diff --git a/fs/hfsplus/unicode.c b/fs/hfsplus/unicode.c > > > index dfa90c2..7ceae91 100644 > > > --- a/fs/hfsplus/unicode.c > > > +++ b/fs/hfsplus/unicode.c > > > @@ -272,8 +272,8 @@ static inline int asc2unichar(struct > > > super_block *sb, const char *astr, int len, > > > return size; > > > } > > > > > > -/* Decomposes a single unicode character. */ > > > -static inline u16 *decompose_unichar(wchar_t uc, int *size) > > > +/* Decomposes a non-Hangul unicode character. */ > > > +static u16 *decompose_nonhangul(wchar_t uc, int *size) > > > { > > > int off; > > > > > > @@ -296,6 +296,51 @@ static inline u16 *decompose_unichar(wchar_t > > > uc, int *size) > > > return hfsplus_decompose_table + (off / 4); > > > } > > > > > > +/* > > > + * Decomposes a Hangul unicode character. > > > + * > > > + * This function was adapted from sample code in section 3.12 of > > > the > > > + * Unicode Standard, version 10.0. > > > + * > > > + * Copyright (C) 1991-2017 Unicode, Inc. All rights > > > reserved. Distributed > > > + * under the Terms of Use in http://www.unicode.org/copyright.ht > > > ml. > > > + */ > > > +static int decompose_hangul(wchar_t uc, u16 *result) > > > +{ > > > + int index; > > > + int l, v, t; > > > + > > > + index = uc - Hangul_SBase; > > > + if (index < 0 || index >= Hangul_SCount) > > > + return 0; > > > + > > > + l = Hangul_LBase + index / Hangul_NCount; > > > + v = Hangul_VBase + (index % Hangul_NCount) / > > > Hangul_TCount; > > > + t = Hangul_TBase + index % Hangul_TCount; > > > + > > > + result[0] = l; > > > + result[1] = v; > > > + if (t != Hangul_TBase) { > > > + result[2] = t; > > > + return 3; > > > + } > > > + return 2; > > > +} > > > + > > > +/* Decomposes a single unicode character. */ > > > +static u16 *decompose_unichar(wchar_t uc, int *size, u16 > > > *hangul_buffer) > > > +{ > > > + u16 *result; > > > + > > > + /* Hangul is handled separately */ > > > + result = hangul_buffer; > > > + *size = decompose_hangul(uc, result); > > Is it possible to distinguish the different cases beforehand and to > > select one proper function for execution without the execution both > > ones > > in the worst case? > The decompose_hangul and decompose_nonhangul functions are called > only once, > so I would expect the compiler to inline them both. The result should > be > similar to your suggestion. > It will be much better to have: If (hangful) { <decompose_hangul> } else { <decompose_nonhangul> } Currently, the code looks weird. Especially, decompose_hangul() is able to distinguish two different cases. So, it's easy to extract the check. > > > > > > > > + if (*size == 0) > > > + /* not Hangul */ > > > + result = decompose_nonhangul(uc, size); > > I think it makes sense to use the brackets here. If I remember > > correctly, this is the code style's requirement. > I couldn't find such a requirement, but I did find some other places > within > the hfsplus code where the comments are done my way. For example, in > the > hfsplus_cat_write_inode() function of the inode.c file. > The hfsplus code contains many awful places. It doesn't mean that it needs to follow the bad style examples. If you have only one string: if (condition) <do_something> then it's OK not to use the brackets. Otherwise, you should include the branch into the brackets. Thanks, Vyacheslav Dubeyko.