Re: [PATCH] hfsplus: fix decomposition of Hangul characters

Ernesto A. Fernández <ernesto.mnd.fernandez@xxxxxxxxx> · Tue, 28 Nov 2017 15:51:02 -0300

On Tue, Nov 28, 2017 at 04:15:18PM +0000, Hin-Tak Leung wrote:
> 
> --------------------------------------------
> On Tue, 28/11/17, Ernesto A. Fernández <ernesto.mnd.fernandez@xxxxxxxxx> wrote:
> 
> > The algorithm is very simple, the best way to
> > understand it is just
> > looking at the code. I
> > don't know the first thing about Korean writing, so
> > I don't think I should attempt to explain
> > why the decomposition is done
> > this way. If
> > somebody else is interested in the details, they can
> > follow
> > the citation in the header comment of
> > the decompose_hangul function.
>  
> Apologies for coming into this a bit late.
> 
> A couple of points:
> 
> 1. Hangul canonical composition and decomposition is a separate topic from compositions of latin characters with accents. It is described in 
> 
>  http://www.unicode.org/reports/tr15/tr15-18.html#Hangul
> 
> among other sources.

I am aware of that, that's the reason this patch exists. If you check the
commit message, you will see it reads:

    "This happens because the normalization of Hangul characters is
    a special case"

> 2. I think the mount option is a bit of a red-herring. I think we should just do what Mac OS X does - I think in the tech note it says something about storing things always in the decomposed form or composed form. Ideally we should make the differing mount options no-ops. Mac OS X does not need extra mount options, we shouldn't either.

MacOS stores the filenames in the NFD form, that is, decomposed. The problem
here was that linux was forgetting to decompose the Hangul.

The mount option has nothing to do with this patch, other than the fact
that it could later be used to access Hangul filenames potentially
stored (mistakenly) without decomposition. It is a very unlikely situation.

As to why the mount option exists, I couldn't tell you. It was here before
the git tree. It is disabled by default anyway, so it's not bothering
anyone.

Thanks,
Ernest